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This  dissertation  studies  the  signal  processing  aspect  of  multi-input  multi-output 
(MIMO)  communications.  The  contribution  of  this  dissertation  is  twofold. 

First,  this  dissertation  presents  a  new  perspective  to  the  MIMO  communications: 
any  MIMO  scheme  can  be  regarded  as  a  MIMO  channel  decomposer,  which  decomposes 
(in  an  information  lossy  or  lossless  manner)  a  MIMO  channel  into  multiple  scalar  sub- 
channels. Based  on  this  perspective,  this  dissertation  presents  three  novel  MIMO  trans- 
ceiver designs,  the  geometric  mean  decomposition  (GMD)  scheme,  the  uniform  channel 
decomposition  (UCD)  scheme,  and  the  tunable  channel  decomposition  (TCD)  scheme. 
All  these  schemes  deploy  either  a  decision  feedback  equalizer  (DFE)  at  the  receiver  or 
a  dirty  paper  precoder  (DPP)  at  the  transmitter.  These  transceiver  designs  represent  a 
paradigm  shift  from  the  conventional  lineax  MIMO  transceiver  designs  to  the  nonlinear 
ones.  The  superior  performance  of  the  GMD  and  UCD  schemes  unveils  the  practical 
significance  of  making  transmitter  and  receiver  cooperate  with  each  other.  That  is,  such 
cooperations  facilitate  achieving  the  optimal  tradeoff  between  the  diversity  gain  and 
multiplexing  promised  by  the  MIMO  communication  theory.  The  TCD  scheme  repre- 
sents a  unifying  solution  to  a  considerably  wide  range  of  problems,  including  designing 
the  precoder  for  orthogonal  frequency  division  multiplexing  (OFDM)  communications 
and  the  optimal  code  division  multiple  access  (CDMA)  sequence  design. 

Second,  thi.s  dissertation  introduces  two  novel  matrix  decomposition  algorithms,  i.e., 
the  geometric  mean  decomposition  (GMD)  and  the  generalized  triangular  decomposition 
(GTD).  The  two  matrix  decompositions  form  the  cornerstones  of  the  three  transceiver 
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designs  proposed  in  this  dissertation.  Moreover,  the  two  decompositions  have  significant 
implications  in  the  matrix  analysis  community.  For  instance,  the  GTD  is  a  new  solution 
to  the  inverse  eigenvalue  problem. 
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CHAPTER  1 
INTRODUCTION 

1.1    Two  Categories  of  Schemes  for  MIMO  Communications 

Communications  over  multiple-input  multiple-output  (MIMO)  wireless  channels 
have  been  a  subject  of  intense  research  over  the  past  several  years  because  deploy- 
ing multiple  antennas  at  both  transmitter  and  receiver  sides  can  drastically  improve  the 
spectral  efficiency  [1]  [2]  [3]  [4].  For  example,  in  contrast  to  the  conventional  additive 
white  Gaussian  noise  (AWGN)  channel  whose  spectral  efficiency  is 

C(snr)  —  log2(l  -t-  snr)  bps/Hz, 

without  requiring  additional  input  power,  the  MIMO  channel  with  Mt  transmitting 
antennas  and       receiving  antennas  can  have  spectral  efficiency  as  large  as  [1]  [2] 

C(snr)  =  min{Mr,  Mt)  log2(snr)  -I-  0(1)  bps/Hz, 

given  that  there  is  plenty  of  scattering  in  the  channel.  Many  spatial  multiplexing  meth- 
ods, e.g.,  the  BLAST  scheme  [2]  [5]  [6]  [7]  [8]  [9]  [10]  [11],  have  been  proposed  to  reap 
the  great  channel  capacity. 

Improving  the  data  transmission  reliability  is  another  advantage  of  applying  multi- 
ple antennas  in  wireless  communications.  By  transmitting  the  same  information  through 
more  than  one  independent  fading  channel,  one  can  obtain  much  more  reliable  commu- 
nications thanks  to  the  redundance  introduced.  The  space-time  coding  methods  are 
based  on  such  a  rationale,  (see,  e.g.,  [12]  [13]  [14]  [15]). 

Zheng  and  Tse  [16]  show  that  one  can  exploit  the  diversity  gain  and  multiplexing 
gain  promised  by  the  MIMO  channel  simultaneously.  However,  there  is  a  fundamental 
tradeoff  between  the  two  gains.  Zheng  and  Tse's  theory  provides  a  unifying  framework 
to  measure  the  performance  of  any  MIMO  schemes.  Hence  designing  practical  schemes 
capable  of  achieving  the  optimal  diversity-multiplexing  tradeoff  is  a  central  research 
topic  in  MIMO  communications. 
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1.2    Joint  Transceiver  Design:  Where  Tx  and  Rx  Collaborate 

All  the  aforementioned  methods  assume  that  the  channel  state  information  (CSI) 
is  available  at  the  receiver  (CSIR)  only.  Under  this  assumption,  collaborations  between 
the  transmitter  and  receiver  are  difficult  in  the  physical  layer.  However,  if  the  commu- 
nication environment  is  relatively  stationary,  the  availability  of  CSI  at  the  transmitter 
(CSIT)  is  also  possible  via  feedback  or  the  reciprocal  principle  when  time  division  duplex 
(TDD)  is  used.  In  fact,  in  the  third  generation  WCDMA  standard  [17],  the  CSIT  is 
assumed  to  obtain  improved  system  performance,  which  is  referred  to  as  the  closed-loop 
transmit  diversity  or  transmit  adaptive  array  (TxAA)  technique.  Based  on  this  assump- 
tion, the  joint  optimal  transceiver  design  (also  referred  to  as  precoding  at  the  transmitter 
and  equalization  at  the  receiver)  has  recently  attracted  considerable  attentions  [18]  [19] 
[20]  [21]  [22]  [23]  [24]  [25]  [26]  [27]  [28]  [29]. 

These  designs  are  based  on  a  variety  of  criteria,  including  minimum  mean-squared- 
error  (MMSE),  [18]  [21]  [22],  maximum  SNR  [21],  maximum  information  rate  [19]  [20] 
[22],  and  BER  based  criteria  [23]  [24]  [25]  [29].  More  recently,  a  unified  framework  has 
been  presented  to  accommodate  all  these  criteria,  under  which  the  design  problems  can 
be  solved  via  convex  optimization  methods  [26] . 

The  aforementioned  literature  on  joint  transceiver  design  considered  linear  trans- 
formations only.  It  is  widely  understood  that  the  singular  value  decomposition  (SVD), 
which  decomposes  a  MIMO  channel  into  multiple  parallel  subchannels,  and  water  fill- 
ing can  be  used  to  achieve  the  channel  capacity  [3].  However,  due  to  the  usually  very 
different  signal-to-noise  ratios  (SNR)  of  the  subchannels,  this  apparently  simple  scheme 
requires  careful  bit  allocation  (see,  e.g.,  [19]  [20]  [23])  to  match  the  subchannel  capacity 
and  achieve  a  prescribed  BER.  Bit  allocation  not  only  increases  the  coding/decoding 
complexity,  but  also  is  inherently  capacity  lossy  because  of  the  finite  constellation  gran- 
ularity. An  alternative  is  to  use  the  same  constellation  in  all  the  subchannels,  like 
the  schemes  adopted  by  the  European  standard  HIPERLAN/2  and  the  IEEE  802.11 
standards  for  wireless  local  area  networks  (WLANs).  However,  for  this  alternative,  the 
BER  is  dominated  by  the  subchannels  with  the  lowest  SNRs.  To  optimize  the  BER 
performance,  more  signal  power  could  be  allocated  to  the  poorer  subchannels.  Yet  this 
approach  causes  significant  capacity  loss  due  to  "inverse  water  filling"  like  power  allo- 
cation. There  is  apparently  a  fundamental  tradeoff'  between  the  capacity  and  the  BER 
performance. 
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1.3  MIMO  Transceiver  Design  from  Channel  Decomposition  Perspective 
In  this  dissertation,  we  present  a  new'  perspective  to  the  MIMO  communications. 
We  regard  the  aforementioned  MIMO  schemes  as  MIMO  channel  decomposers,  which 
decompose  (in  an  information  lossy  or  lossless  manner)  a  MIMO  channel  into  mul- 
tiple scalar  subchannels.  For  instance,  the  MIMO  transceiver  design  based  on  SVD 
decomposes  a  MIMO  into  multiple  eigen-subchannels.  Similarly,  the  V-BLAST  scheme 
decomposes  a  MIMO  channel  into  multiple  scalar  subchannels  which  are  referred  to  as 
layers  by  its  inventors.  These  channel  decompositions,  however,  are  totally  determined 
by  the  specific  channel  realization  and  one  can  have  little  control  over  how  the  channel 
is  decomposed.  For  example,  the  gains  of  the  subchannels  obtained  via  SVD  are  totally 
determined  by  the  singular  values  of  the  channel  matrix,  which  one  can  have  no  control 
over. 

An  interesting  question  arises:  if  the  transmitter  and  receiver  are  allowed  to  col- 
laborate, how  can  we  design  a  transceiver  that  can  decompose  a  MIMO  channel  into 
multiple  subchannels  with  prescribed  channel  gain,  and  without  incurring  capacity  loss? 
This  dissertation  is  devoted  to  answering  this  question.  In  the  process  of  pursuing  the 
answer,  we  investigate  the  following  aspects  of  the  problem. 

First,  we  show  that  the  conventional  linear  transceivers  are  inherently  inflexible, 
and  we  cannot  rely  on  linear  transceivers  to  achieve  our  desired  channel  decomposi- 
tions. Hence  we  need  to  go  beyond  the  linearity  constraint  and  investigate  the  nonlinear 
schemes,  such  as  a  decision  feedback  equalizer  (DFE)  and  a  dirty  paper  precoder  (DPP). 

Second,  we  study  the  possibility  of  new  matrix  decompositions  other  than  using 
SVD.  We  propose  two  novel  matrix  decomposition  algorithms,  the  geometric  mean  de- 
composition (GMD)  and  the  generalized  triangular  decomposition  (GTD).  The  two  de- 
compositions represent  a  wide  class  of  matrix  decomposition,  which  has  significant  im- 
plications in  the  matrix  analysis  community.  For  instance,  the  GTD  is  a  new  solution 
to  the  inverse  eigenvalue  problem. 

Third,  we  propose  three  transceiver  designs  which  combine  the  new  matrix  decom- 
position algorithms  with  the  DFE  and  DPP.  The  three  designs  are  the  GMD  scheme,  the 
uniform  channel  decomposition  (UCD)  scheme  and  the  tunable  channel  decomposition 
(TCD)  scheme.  Among  them,  the  UCD  scheme  can  decompose,  in  a  strictly  capacity 
lossless  manner,  a  MIMO  channel  into  multiple  subchannels  with  identical  capacities  or, 
equivalentl}',  identical  channel  gains.  Moreover,  the  UCD  scheme  is  a  practical  scheme 
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that  can  achieve  the  optimal  tradeoff  between  the  diversity  gain  and  multiplexing  gain. 
Without  incurring  any  capacity  loss,  the  TCD  scheme  can  decompose  a  MIMO  chan- 
nel into  multiple  subchannels  with  prescribed  capacities/channel  gains.  This  scheme 
is  applicable  to  a  wide  range  of  applications,  including  the  multi-task  communications 
where  independent  data  streams  with  different  qualities-of-service  (QoS)  share  the  same 
MIMO  channel,  and  designing  the  optimal  CDMA  sequences. 

1.4    Dissertation  Outline 

In  Chapter  2,  we  introduce  the  data  model  and  some  relevant  information- theoretic 
results  that  will  be  used  in  this  dissertation.  We  also  review  the  existing  transceiver 
designs  and  analyze  the  performances  of  those  methods.  By  linking  the  channel  capacity 
with  the  Cramer  Rao  bound  (CRB),  we  give  an  information-theoretic  explanation  why 
linear  transceivers  are  inflexible. 

Chapter  3  presents  the  GMD  scheme  that  combines  the  VBLAST  detector  or  DP 
precoder  with  the  GMD  matrix  decomposition  algorithm.  The  GMD  scheme  can  decom- 
pose a  MIMO  channel  into  multiple  identical  scalar  subchannels.  This  desirable  prop- 
erty can  bring  much  convenience  to  the  practical  system  design,  particularly  the  symbol 
constellation  selection.  Moreover,  we  have  shown  that  the  GMD  scheme  is  optimal  as- 
ymptotically for  high  SNR  in  terms  of  both  information  rate  and  BER  performance 
while  the  computational  complexity  of  our  scheme  is  comparable  to  the  conventional 
linear  transceiver  scheme. 

In  Chapter  4,  we  propose  a  uniform  channel  decomposition  (UCD)  scheme.  Similar 
to  the  GMD  scheme,  the  UCD  is  also  based  on  the  GMD  matrix  decomposition  algo- 
rithm and  can  decompose  a  MIMO  channel  into  multiple  identical  subchannels.  Two 
remarkable  merits  of  UCD,  which  are  not  shared  by  the  GMD  scheme,  are  that  first, 
UCD  is  strictly  capacity  lossless  at  any  SNR,  and  second,  UCD  can  achieve  the  opti- 
mal diversity  and  multiplexing  tradeoff.  Moreover,  the  UCD  scheme  can  decompose 
a  MIMO  channel  into  an  arbitrarily  large  number  of  independent  subchannels,  which 
is  an  enabling  technology  to  achieve  high  data  rate  transmission  using  small  symbol 
constellations. 

Chapter  5  is  devoted  to  tackhng  a  new  aspect  of  the  MIMO  transceiver  design 
problem.  Instead  of  attempting  to  optimize  the  BER  performance  for  fixed  input  power 
and  data  rate,  we  propose  the  TCD  sclieme  which  can  decompose  a  MIMO  channel 
into  multiple  subchannels  with  prescribed  channel  capacities.  We  show  that  TCD  is  a 
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solution  to  a  wide  range  of  applications,  including  the  applications  in  which  independent 
data  streams  with  different  qualities-of-service  (QoS)  share  the  same  MIMO  charmel  and 
design  the  optimal  CDMA  sequences. 

The  mathematical  foundations  of  this  dissertation,  the  GMD  and  GTD  algorithms, 
are  established  in  Chapter  6.  The  two  novel  matrix  decomposition  algorithms  are  the 
cornerstones  of  the  MIMO  transceiver  designs  proposed  in  this  dissertation. 

The  conclusions  are  given  in  Chapter  7. 

To  read  this  dissertation,  it  is  unnecessary  to  plunge  into  the  details  of  the  GMD 
and  GTD  algorithms.  For  this  reason,  we  put  them  to  the  latter  part  of  the  disserta- 
tion. However,  a  rough  understanding  of  the  two  algorithms  is  necessary  to  appreciate 
Chapters  3-5. 


CHAPTER  2 
LINEAR  MIMO  TRANSCEIVER  DESIGNS 

2.1    Channel  Model  and  Channel  Capacity 

2.1.1    Channel  Model 

We  consider  a  communication  system  with  Mt  transmitting  and  Mr  receiving  an- 
tennas in  a  frequency  flat  fading  channel.  The  sampled  baseband  signal  is  given  by 


HFx  +  z, 


(2.1) 


where  x  G  C^*^'  is  the  information  symbols  precoded  by  the  precoder  F  6  C^''*^  and 
y  £  C^""*^^  is  the  received  signal  and  H  G  cAfrxA/t  jg  ^^jjg  channel  matrix  with  rank  K. 
We  assume  E[xx*]  =  alli  and  z  ~  A'^(0,a^IwJ  is  the  circularly  symmetric  complex 
Gaussian  noise,  where  1l  stands  for  an  identity  matrix  with  dimension  L.  We  define  the 
input  SNR  as 

(2.2) 


i;|x-F-Fx|  ^  ^  1 


a 


where  a  = 


Designing  the  MIMO  transceivers,  including  the  precoder  F  and  the 


associated  equalizer,  is  the  focus  of  this  dissertation. 

We  note  that  the  data  model  in  (2.1)  is  generic.  For  an  intersymbol-interference 
(ISI)  channel  with  impulse  response  h  =  [ho,  hi, ... ,  /im-i]^  with  (•)^  denoting  trans- 
pose, if  a  block  data  with  length  are  transmitted  using  the  "zero-padded"  OFDM, 
then  the  received  block  data  can  also  be  written  in  the  form  of  (2.1)  with 
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(2.3) 


In  this  case,  H  is  a  Toeplitz  matrix  with  its  dimensionality  Mt  =  N  and  Mr  =  N  +  M  —  I. 
If  the  OFDM  with  cyclic  prefix  is  used,  the  channel  matrix  is  a  circulant  Toeplitz  matrix. 
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I.e., 
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(2.4) 


Here,  M(  =  Mr  =  N.  In  either  case,  if  the  block  data  are  precoded  with  the  Hnear 
precoder  F,  then  the  received  data  are  given  in  (2.1).  This  ISI  channel  problem  has 
been  studied  in  [21]  [30]. 

In  an  idealized  synchronous  CDMA  (S-CDMA)  system  where  the  channel  does 
not  experience  any  fading  or  near- far  effect,  L  mobile  users  modulate  their  information 
symbols  via  spreading  sequences  {silflj,  each  of  which  has  the  processing  gain  A'^.  The 
discrete-time  baseband  S-CDMA  signal  received  at  the  (single-antenna)  base-station  can 
be  represented  as  [31] 

y  =  Sx  +  z  (2.5) 

where  S  =  [si,...,sl]  6  R'^^^  and  the  Ith  (l  <  I  <  L)  entry  of  x,  xi,  stands  for 
the  information  symbol  from  the  Ith  user.  In  the  downlink  channel,  the  base  station 
multiplexes  the  information  dedicated  to  the  L  mobile  users  through  the  spreading 
sequences,  which  are  the  columns  of  S.  Then  all  the  mobiles  receive  the  same  signal 
given  in  (2.5).  We  remark  that  (2.5)  can  also  be  written  as  (2.1)  with  H  =  I;v  and  F  =  S. 
Here  Mr  =  Mt  =  N  is  the  processing  gain.  Hence,  optimizing  the  spreading  sequences 
amounts  to  optimizing  the  precoder  F  for  a  MIMO  system.  Indeed,  this  problem  has 
been  under  intensive  research  in  the  past  several  years. 

In  summary,  both  designing  a  precoder  for  OFDM  transmission  through  an  ISI 
channel  and  searching  for  the  optimal  S-CDMA  sequences  can  be  regarded  as  special 
cases  in  the  unifying  framework  of  MIMO  transceiver  designs.  MIMO  transceiver  designs 
can  be  used  in  the  OFDM  and  CDMA  applications  after  only  simple  modifications.  In 
this  dissertation,  we  will  concentrate  on  MIMO  transceiver  design  although  we  will 
discuss  the  optimal  design  of  CDMA  sequences  in  Chapter  5. 
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2.1.2    Channel  Capacity 

Suppose  X  is  a  Gaussian  random  vector.  The  capacity  of  the  MIMO  channel  (2.1) 


is  1 


r    In.  k.'I  +  ^xHFF*H*| 

^  ^°S2  i^i^   (2-6) 

where  |  •  |  denotes  the  determinant  of  a  matrix.  If  both  CSIT  and  CSIR  are  available,  we 
can  maximize  the  channel  capacity  with  respect  to  F  given  the  input  power  constraint 
a2Tr{FF*}  =  pa^.  That  is, 

C,T=       max       log2|I  +  a-iHFF*H*|,  (2.7) 

where  a  is  as  defined  in  (2.2)  and  the  subscript  of  C/r  stands  for  "informed  transmitter" . 

Denote  the  SVD  of  H  as  H  =  UAV*,  where  A  is  a  /C  x  diagonal  matrix  whose 
diagonal  elements  {\H,k}k=\  nonzero  singular  values  of  H.  The  solution  to  F  in 

(2.7)  is  [3] 

F  =  V*^/2  (2.8) 

Here  *  is  diagonal  whose  A;th  {I  <  k  <  K)  diagonal  element  (j)k  determines  the  power 
loaded  to  the  A;th  subchannel  and  is  found  via  "water  filling"  to  be 

with  /X  being  chosen  such  that  (jlYlk=\  'f^df^)  =  P^l  ^^'^  (°)^  =  max{0,a}.  Then  the 
solution  to  (2.7)  is 

Cn  =  J2       (l  +  -><l,k)  bps/Hz.  (2.10) 
fc=i        \  / 

Note  that  some  of  cp^s  can  be  zeros.  In  this  case,  we  can  only  transmit  L  <  K  data 
streams. 

If  the  CSIT  is  not  available,  the  optimal  transmission  strategy  is  to  evenly  allocate 

/  pa 

power  to  each  antenna  [3].  For  this  case,  F  =  \  jT^Mt  and  the  channel  capacity  with 

V 

uninformed  transmitter  (UT)  is 


Cut  =  J2^ogJ  I +  '-^]  bps/Hz.  (2.11) 

n=l  ^  '    t  / 


^  Throughout  this  dissertation,  we  assume  that  the  coherent  time  of  the  channel  goes 
to  infinity.  Hence  advanced  coding  is  applicable  to  approach  the  Shannon  capacity. 


It  is  proven  [32]  that  it  K  =  Mt 

CjT 


1    as    p^oo.  (2.12) 


Cut 

We  claim  a  stronger  relationship  as  follows.  ^ 

Lemma  2.1.1  For  the  data  model  in  (2.1),  if  the  channel  matrix  H  is  of  full  column 
rank,  i.e.,  K  =  Mt,  then 

CjT-CuT^O    as   p^oo.  (2.13) 
Proof:   Inserting  (2.9)  into  (2.10)  yields 

K 

CiT  =  J^log,{pXl„)^  (2.14) 

n=l 

where  fi  is  chosen  such  that 


n=l  ^H,n 

or 

K 


^  1 

Kp-J2-^^p,  (2.15) 


K  K^,Xl^ 

n=l 

Here  we  assume  that  all  the  K  subchannels  are  used  because  of  large  p,  i.e.,  u,  —  vy—  >  0 
forn  =  1,2,..../^. 

From  (2.14),  (2.16),  and  (2.11),  and  using  the  fact  that  K  =  Mt,  we  have 

CiT  -CuT  =  Yl  log2    -  ——-JT^    ■  (2.17) 


Note  that 

lim   =  1    for    1  <  n  <  A'  (2.18) 

p^oc        p  + 

and  that  f{x)  =  logj  x  is  a  continuous  function  if  x  >  0.  The  lemma  follows  immediately 
from  (2.17).  ■ 
However,  we  note  that  CSIT  can  be  very  helpful  in  the  following  cases: 

A.  The  SNR  is  low  or  moderate. 

B.  H  is  rank  deficient  or  ill-conditioned. 


^  A  similar,  but  somewhat  vague,  statement  is  found  in  [8]. 
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C.  There  are  more  transmitting  antennas  than  receiving  ones,  i.e.,  Mf  >  Mr. 

Moreover,  the  availabihty  of  CSIT  provides  more  freedom,  which  makes  it  easier  to 
devise  joint  transceiver  design  schemes  to  achieve  the  underlying  channel  capacity.  This 
observation  is  the  underlying  theme  of  this  dissertation. 

2.2    Channel  Capacity  and  Cramer- Rao  Bound 

One  of  the  most  important  significances  of  the  Shannon's  information  theory  is  that 
this  theory  can  predict  the  highest  achievable  data  rate  for  a  given  channel.  Similarly, 
the  Cramer- Rao  bound  (CRB)  [33],  which  is  the  inverse  of  the  Fisher  information  matrix 
(FIM),  can  predict  the  minimum  mean  squared  error  (MMSE)  an  estimator  can  achieve. 
In  this  section,  we  show  that  the  MIMO  channel  capacity  formula  of  (2.6)  can  be  re- 
written as  a  function  of  CRB,  or  FIM.  Based  on  this  relationship,  we  show  that  linear 
transceivers  lack  flexibility. 

We  rewrite  (2.1)  as  follows: 

y  =  Hx  +  z,  (2.19) 

but  we  relax  the  assumption  of  (2.1)  slightly.  Instead  of  assuming  spatially  white  noise, 
we  assume  that  z  ~  A^(0,  R2).  We  also  assume  that  the  channel  input  x  ~  A^(0,R3;) 
also  has  circularly  symmetric  complex  Gaussian  distribution  and  is  independent  of  z. 
Then  the  channel  output  y  ~  A'^(0,HRj.H*  +  R^).  For  this  more  general  scenario,  the 
channel  capacity  is 


C  =  log 


IRj  +  HRxH*| 


(2.20) 


Now  Consider  the  following  random  vector, 


X 

Ri 

R3-H* 

~  N  ^0, 

y 

HRx 

Ry 

(2.21) 


Its  log-likelihood  function  is 

log  /(x,  y)  =  -const  -  [x*  y* 
Using  the  block  matrix  inversion  formula  [34],  we  get 


Ri 

R3.H* 

-1 

X 

HR3; 

Ry 

y 

(2.22) 


R.T 

RjH* 

-1 

A  B 

HR^ 

Ry 

0 

(2.23) 
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where 


, -1 


A  =  (Rx  —  RxH*Rj^^HRi) 
B  =  —  (Rx  —  RiH*Ry  ^HRx)    RxH*Ry  ^ 
and  o  is  irrelevant  to  the  present  discussion.  Prom  (2.22)-(2.25)  we  have 
_  a  log  Ax,  y)  ^       _  r^h*R;'HR.)-^  (x  -  R.H*R;V) 


(2.24) 
(2.25) 

(2.26) 


where  x  is  the  conjugate  of  x.  Here  we  define  the  differentiation  with  respect  to  a 
complex- valued  vector  as  [35,  Appendix  B] 


_a_  _  1 

dw~2 


dui      ^  dvi 


\ 


\  duM       -I  dvM  I 


(  ^  +  jJ-\ 

dui       ■'  dvi 


(2.27) 


where  the  mth  entry  of  w,  Wm  =  Um  +  jvm,  =  1,...,M.  The  Bayesian  Fisher 
information  matrix  (FIM)  [36]  is  given  by 

■51og/(x,y)51og/(x,y)'^"i 


FIM  =  E 


Based  on  (2.26)  and  (2.21),  we  obtain 


ax 


ax 


(2.28) 


FIM  =  A[I ;  -  RxH*Rr'] 


Rx  RxH* 
HRx  Rj/ 
=   (Rx  —  RxH'Ry  'HRx) 
=  R;>+H*(Ry-HRxH*)-^H 
=  R:i+H*R-iH 


Ry  'HRx 


A 


Comparing  (2.20)  and  (2.31),  we  see  that 


where 


R  =  log2|Rx|+log2|FIM| 
=  log2|Rx|-log2|CRB| 

CRB  =  FIM-'  -  Rx  -  R,H*R,t1HR. 


(2.29) 
(2.30) 
(2.31) 


(2.32) 
(2.33) 


~y  ---~x.  (2.34) 

This  shows  that  there  exists  a  simple  relation  between  the  Gaussian  MIMO  channel 
throughput,  which  is  an  upper  bound  of  the  information  transmission  rate  for  any 
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coder /decoder,  and  the  CRB,  which  is  a  lower  bound  on  the  covariance  matrix  of  any 
unbiased  estimator  of  x. 

The  MMSE  estimator  of  x  is 

Xmmse  =  RxH  (HRxH*  +  R,)-^  y  (2.35) 

It  is  easy  to  verify  that  the  MMSE  estimator  of  x  can  achieve  the  CRB.  Hence  the  MMSE 
estimator  is  the  best  we  can  achieve  under  the  Gaussian  assumptions.  In  general  cases, 
the  matrices  FIM  and  CRB  are  non-diagonal;  i.e.,  the  MMSE  estimates  of  the  elements 
of  X  are  correlated.  The  correlations  between  the  elements  of  x  clearly  contain  useful 
information  for  the  subsequent  decoding  procedures.  However,  in  practice,  we  only 
estimate  the  single  elements  of  x  separately  and  ignore  the  correlations  between  these 
elements.  This  causes  the  loss  of  information.  In  fact,  we  can  quantify  the  capacity  loss 
as 

Mt 

=      log  CRBfcfc  -  log  |CRB|  (2.36) 

fc=i 

where  CRB^t  denotes  the  k-th  diagonal  element  of  CRB.  According  to  the  Hadamard 
inequality  [34],  for  any  positive  semidefinite  matrix  M  €  C^, 

K 

M<Y[Mkk  (2.37) 

i=l 

and  the  equality  holds  if  and  only  if  M  is  diagonal.  Hence  >  0  and  there  is  no 
capacity  loss  if  and  only  if  CRB  is  a  diagonal  matrix. 

Based  on  the  aforementioned  discussions,  we  see  that  i)  in  general  MIMO  com- 
munications, linear  MMSE  estimators  followed  by  separate  substream  decoding  are  not 
capacity-wise  optimal  and  ii)  if  the  channel  matrix  H  has  the  property  that  CRB  of 
(2.34)  is  a  diagonal  matrices,  linear  MMSE  estimators  may  be  the  first  step  of  capacity 
lossless  processing.  If  CSIT  is  available,  the  transmitter  can  apply  some  precoder  F  and 
get  a  virtual  channel  matrix 

H,.  =  HF  (2.38) 

such  that  CRB  is  diagonal.  This  explains  why  all  the  existing  linear  transceiver  designs 
invariably  lead  to  the  diagonalization  of  tlie  channel  matrix.  Indeed,  if  R^,  is  diagonal 
and  R^  =  a'^l,  then  it  follows  from  (2.31)  that  H^^  must  have  orthogonal  columns 
to  get  diagonal  FIM  and  hence  diagonal  CRB.  Then  the  precoder  F  =  V,  which 
is  the  right  singular  vector  of  H,  is  the  only  optimal  solution.   Yet  as  we  discussed 
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before,  this  inflexible  transceiver  scheme  can  bring  many  difficulties  to  the  subsequent 
coding/decoding  and  modulation/demodulation  procedures. 

2.3    Rate  Performance  of  Linear  Transceivers 

To  gain  more  insights  into  the  limitations  of  the  linear  transceiver  designs,  we  ana- 
lyze the  asymptotic  rate  performances  of  two  typical  linear  transceiver  designs  for  high 
SNR.  We  will  show  that  the  linear  transceivers  may  suffer  from  considerable  capacity 
loss  and  there  is  apparently  a  fundamental  tradeoff  between  the  throughput  and  the 
BER  performance. 

According  to  the  channel  model  of  (2.1),  the  received  data  vector  is 

y  =  HFx  +  z.  (2.39) 

The  optimal  linear  receiver  is  always  the  LMMSE  equalizer  (also  see,  e.g.,  [23]) 

G^,  =  F*W  (HFF*H V2  +  a^I) ,  (2.40) 

which  yields  the  optimal  estimate  of  the  information  symbol  s  =  Gopty.  The  mean- 
squared  -error  (MSE)  matrix  of  s  is 

E  =  (I  +  a-'F*H*HF)"^ .  (2.41) 

Note  that  E  is  a  function  of  the  linear  precoder  F.  In  the  following,  we  analyze  two  linear 
precoder  designs  based  on  the  minimization  of  the  trace  of  the  MSE  matrix  (MTM)  and 
the  minimization  of  the  maximum  diagonal  elements  of  MSE  matrix  (MMD)  criteria, 
which  are  referred  to  as  ARITH-MSE  and  MAX-MSE  in  [26],  respectively.  We  choose 
these  two  schemes  because  they  appear  to  be  the  most  typical  ones  and  the  MMD 
scheme  yields  the  optimal  (or  very  close  to  the  optimal)  performance  among  all  the 
linear  transceivers.  Indeed,  the  MMD  is  equivalent  to  the  linear  MIN-BER  scheme  in 
the  flat  fading  channel  case  (see  [26]).  We  do  not  consider  the  SVD  plus  water  filling 
scheme  herein  since  it  requires  the  complicated  bit  loading. 

The  MTM  scheme,  or  ARITH-MSE,  which  has  appeared  in  several  linear  transceiver 
design  papers  (see,  e.g.,  [22]  [23]  and  [26]),  attempts  to  minimize  tr(E)  with  respect  to 
F.  The  MTM  precoder  turns  out  to  be 


(2.42) 
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where  V  is  as  defined  in  the  SVD  H  =  UAV*,  and  ^  is  a  diagonal  matrix  whose  ith 
diagonal  element  <j)i  denotes  the  signal  power  loaded  to  the  ith.  subchannel.  According 
to  the  literature  (see  e.g.  [23]  Sec.  III-A) 

where  fi  is  the  Lagrange  multiplier  which  controls  the  loaded  power  such  that  J2iLi  = 
pa^.  Suppose  p  is  sufficiently  large.  Then  all  the  K  subchannels  are  used  and 


or 

= '^K^r-\  (2-45) 


-1/2  ^  P  +  ^f=l  ^H,i 


Substituting  (2.42),  (2.43)  and  (2.45)  into  (2.41),  we  see  that  E  is  diagonal  with  the  ith 
diagonal  element 

Then  (cf.  Equation  (28)  of  [26]) 

Ci   =   -\og^  Ei  (2.47) 
=   logJ^-^i%^)+log.A,,.  (2.48) 

Hence  the  sum  rate  of  the  channel  using  the  MTM  scheme  is 

Cmtm  =  J2c,  =  K  log,  +      log2  A//,.  (2.49) 

j=l  \     l^k=l  ^H,k     )  i=l 

The  channel  capacity  with  uniform  power  loading  in  the  K  subchannels  is 

K 

CuPL  =  J]  log2(l  +  |.A?,,).  (2.50) 

i—\ 

Here  Cvpl  is  different  from  Cut  defined  in  (2.11)  in  that  Cvpi  corresponds  to  the 
channel  with  the  transmitter  knowing  the  range  space  of  H. 
It  follows  from  (2.49)  and  (2.50)  that 

Cupl-Cmtm  =  Elog,  (l  +  |:A^;,,^)-/i-log,  -Elog.A^,.  (2.51) 
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After  some  straightforward  calculations,  we  have 


lim  CupL  -  Cmtm  =  K\og^  ^  "  bps/Hz.  (2.52) 


Note  that  for  any  real  valued  sequence  {\H.i]t^\  >  0,  the  arithmetic  mean  is  greater  than 
or  equal  to  the  geometric  mean,  or  Yli^i  ^yjf^i  >  [Y[i=i  ^H]i)  ■  Hence  we  conclude 
that  limp_»oo  C'f/PL  —  Cmtm  >  0  and  the  equality  holds  if  and  only  if  {XhAu^i  ^^e  all 
the  same.  We  infer  from  (2.52)  that  the  capacity  loss  of  the  MTM  transceiver  can  be 
quite  large  if  the  channel  matrix  H  has  a  large  condition  number,  which  is  verified  in 
Section  3.4. 

If  the  same  constellation  is  used  for  each  subchannel,  then  the  substream  cor- 
responding to  the  largest  Ei  dominates  the  overall  BER  performance.  Recall  that 
Ei  =  ^^k'\-2'\. — ,  which  is  proportional  to  the  inverse  of  Xni-  Hence  the  sub- 
channels  may  have  very  different  SNRs  especially  when  H  has  a  large  condition  number. 
To  mitigate  this  undesirable  effect,  one  can  use  the  MMD  transceiver,  or  MAX-MSE 
(cf.  [26]  Section  V-A5),  with 

Fma/d  =  Fa/ta/©,  (2.53) 

where  0  is  a  unitary  matrix  that  makes  all  the  diagonal  elements  of  E  in  (2.41)  the 
same,  that  is, 

E-^Ee,.  (2.54) 

i=l 

According  to  (2.47),  the  capacity  of  the  channel  using  the  MMD  linear  transceiver  is 

1 

Cmmd  =  -K  hg^  E  = -K  hg^  —  J]  Ei.  (2.55) 

Thus 


K 

i=l 


Cmtm  -  Cmmd   =^   I<  log2  J^'^E,  -  '^  log^  Ei  (2.56) 


i=l  1=1 
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where  to  get  (2.58)  from  (2.57)  we  have  used  (2.46).  Note  that  the  relative  capacity  loss 
of  MMD  compared  with  MTM  is  independent  of  SNR  given  that  all  the  subchannels 
are  used.  Interestingly,  we  can  see  from  (2.58)  and  (2.52)  that  Cmtm  —  C^md  = 
Ump_ooCj/p/,  -  Cmtm-  We  conclude  that  asymptotically  for  high  SNR,  the  MMD 
transceiver  has  twice  the  capacity  loss  of  MTM,  i.e.. 


although  it  may  yield  better  BER  performance.  An  intuitive  explanation  of  the  capacity 
loss  of  the  MMD  transceiver  is  as  follows.  Note  that  the  only  difference  between  MTM 
and  MMD  is  the  prerotation  matrix  0,  which  is  an  invariant  operator  in  terms  of 
information  capacity.  However,  0  makes  the  MSE  matrix  E  non-diagonal,  which  means 
that  the  elements  of  s  =  GaptY  are  correlated.  Clearly,  the  correlation  contains  useful 
information  for  symbol  detection  and  decoding.  However,  the  linear  equalizer  ignores 
the  correlation,  which  results  in  the  additional  capacity  loss  quantified  in  (2.58).  The 
analyses  here  are  verified  in  Section  3.4. 

In  summary,  the  MTM  transceiver  suffers  from  capacity  loss  of  (2.52)  due  to  the 
information  theoretically  non-optimal  power  loading  defined  in  (2.43).  The  MMD  trans- 
ceiver suffers  from  additional  capacity  loss  because  it  makes  the  MSE  matrix  non- 
diagonal.  Hence  there  is  an  apparently  inevitable  tradeoff  between  the  information 
rate  and  BER  performance  if  the  same  symbol  constellation  is  used  in  the  different  sub- 
channels. In  the  next  chapter,  we  will  introduce  the  GMD  scheme  and  clarify  that  there 
is  not  necessarily  a  tradeoff  between  BER  performance  and  channel  capacity.  Indeed, 
the  GMD  scheme  attempts  to  achieve  the  best  of  both  worlds  simultaneously. 


lim  CupL  -  Cmmd  =  2K  logj 


bps/Hz, 


(2.59) 


CHAPTER  3 

MIMO  TRANSCEIVER  DESIGN  USING  GEOMETRIC  MEAN  DECOMPOSITION 

3.1    VBLAST  and  ZF-DP 
In  this  section,  we  first  give  a  brief  introduction  to  the  VBLAST  architecture  [5], 
which  is  equivalent  to  the  generalized  decision  feedback  equalizer  (GDFE)  [37].  We  also 
introduce  the  more  recent  zero-forcing  "dirty  paper"  precoder  (ZFDP)  applied  to  the 
MIMO  broadcast  channels  [38]  [39]. 
3.1.1  VBLAST 

VBLAST  is  a  simple  suboptimal  receiver  interface  which  is  used  in  the  MIMO 
system  assuming  that  only  CSIR  is  available.  For  a  MIMO  system  (2.1)  with  Mt  < 
Mr  and  rank  K  —  Mt,  the  transmitter  allocates  independent  bit  streams  across  the 
Mt  transmitting  antennas  with  no  precoding.  To  decode  the  transmitted  information 
symbol,  VBLAST  first  estimates  the  signal  with  the  spatial  structure  Iim,,  where 
denotes  the  ith  column  of  H,  and  then  cancels  it  out  from  the  received  signal  vector. 
Next,  it  estimates  the  signal  with  spatial  structure  hMt-i  and  so  on.  The  signal  estimator 
can  be  either  the  ZF  or  MMSE  estimator.  Some  proper  reordering  of  the  columns  of  H  is 
helpful  to  improve  the  BER  performance  [5].  This  decoding  scheme  involves  sequential 
nulling  and  cancellation  which  is  proved  to  be  equivalent  to  the  generalized  decision 
feedback  equalizer  (GDFE)  [37]. 

The  ZF  nulling  step  in  the  VBLAST  scheme  can  be  represented  by  the  QR  decom- 
position H  =  QR  where  Q  is  an  Mt  x  K  matrix  with  orthonormal  columns  and  R  is  a 
K  X  K  upper  triangular  matrix.  Let  us  rewrite  (2.1)  as 

y  =  QRx  +  z.  (3.1) 

Multiplying  Q*  to  both  sides  of  (3.1)  yields 

y  =  Rx  +  z,  (3.2) 
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or 


yi 

rn 

...  riK 

Xi 

0 

■  ■■  r2K 

X2 
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h 

(3.3) 

Vk 

0 

0  TKK 

The  sequential  signal  detection  is  as  follows 
for  i  =     :  -1  :  1 


X 

end 


where  C  stands  for  mapping  to  the  nearest  symbol  in  the  symbol  constellation.  Ignoring 
the  error-propagation  effect,  we  see  that  the  MIMO  channel  is  decomposed  into  K 
parallel  scalar  subchannels 


yi  =  'riiXi  +  Zi,    i  =  l,2,...,K. 


(3.4) 


3.1.2  ZF-DP 

We  consider  a  broadcast  MIMO  channel  with  Mt  transmitting  antennas  and  Mr 
receiving  antennas  (M(  >  M^).  The  channel  model  is  exactly  the  same  as  (2.1)  and  the 
CSIT  is  available.  However  the  receiving  antennas  cannot  cooperate  with  each  other.  A 
vector  transmission  scheme  was  proposed  in  [40],  which  combines  the  QR  decomposition 
and  "dirty  paper"  precoding.  We  refer  to  this  approach  as  the  zero-forcing  "dirty  paper" 
precoding  (ZFDP).  (The  use  of  the  "dirty  paper"  phrase  is  due  to  Costa  [41].) 

The  ZFDP  scheme  resembles  the  zero-forcing  VBLAST  method.  It  also  goes 
through  the  sequential  nulling  and  cancellation  procedure.  The  only  difference  is  that 
all  these  operations  are  done  by  the  transmitter. 

By  assuming  H  to  be  of  full  row  rank,  i.e.,  K  =  AU,  ZFDP  also  begins  with  the 
QR  decomposition  H*  —  QR.  Let  us  rewrite  (2.1)  as 


Denoting  x  =  Qx  yields 


y  =  R*Q*x  +  z. 


y  =  R*x  +  z, 


(3.5) 


(3.6) 
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yi 

rn     0  . 
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hi    h2  ■ 

.  0 
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+ 
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•  r-KK 

Xk 

(3.7) 


Denote  s  G  C^'^^  to  be  the  symbol  vector  destined  for  the  K  receivers.  We  wish  to  have 
X  satisfying 


hisi 

rn     0  . 

.  0 

h2S2 

hi    h2  ■ 

.  0 

X2 

rKKSr 

TKl     ■■■  ■ 

•  r^K 

(3.8) 


The  solution  to  (3.8)  is 

X  =  R-'diag{R}s.  (3.9) 

However,  the  matrix  inversion  can  amplify  the  norm  of  x  significantly  which  can  lead 
to  additional  power  consumption  at  the  transmitter.  By  exploiting  the  finite  alphabet 
property  of  the  communication  signals,  the  modulo  arithmetic  precoder  (more  recently 
known  as  the  Tomlinson-Harashima  Precoder  [42],  [43])  can  be  applied  to  bound  the 
value  of  the  transmitted  signal.  Moreover,  the  trellis  precoding  can  be  used  to  eliminate 
the  1.53  dB  shape-loss  of  Tomlinson-Harashima  precoding  [44].  The  ZFDP  transmission 
scheme  decomposes  the  MIMO  channel  into  K  parallel  scalar  channels  (see  [40]  for  more 
details) 

yi  =  riA  +  Zi       i  =  l,2,...,K.  (3.10) 

Several  remarks  are  now  in  order,  a)  VBLAST  is  shown  to  be  able  to  achieve  only 
about  72%  of  the  capacity  [5] .  That  is  because  imposing  the  same  rate  of  transmission 
on  all  the  transmitters  makes  the  channel  capacity  limited  by  the  worst  of  the  K  scalar 
subchannels,  b)  VBLAST  has  only  diversity  gain  of  Mr—Mt+l.  c)  ZFDP  can  achieve  the 
broadcast  channel  capacity  for  high  SNR  [39] ,  but  the  subchannels  have  different  fading 
levels.  Hence  the  transmitter,  just  like  the  aforementioned  linear  transceivers,  have  to 
consider  the  tradeoffs  between  the  BER  performance  and  the  channel  throughput,  d) 
ZFDP  scheme  causes  no  error  propagation,  and  thus  (3.10)  is  precise,  e)  Both  VBLAST 
and  ZFDP  involve  nonhnear  operations. 
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3.2    Geometric  Mean  Decomposition  for  MIMO  Transceiver  Design 

Note  that  VBLAST  assumes  no  cooperations  among  transmitting  antennas  and 
ZFDP  assumes  no  cooperations  at  the  receivers.  Then  a  natural  question  arises:  can 
we  exploit  both  the  CSIR  and  CSIT  to  make  things  better  if  both  CSIR  and  CSIT  are 
available?  We  attempt  to  address  this  question  next. 

In  the  sequel,  we  assume  that  the  same  signal  constellation  is  used  in  all  the  inde- 
pendent symbol  streams  to  reduce  the  system  complexity.  This  is  consistent  with  the 
HIPERLAN/2  and  IEEE  802.11  standards.  Then  the  overall  BER  performance  of  the 
system  will  be  limited  by  the  subchannel  with  the  lowest  SNR.  To  mitigate  this  problem, 
based  on  (3.4)  and  (3.10),  we  consider  the  following  optimization  problem 

max  min     (r.j  :  1  <  i  <  K] 

QP  l     U  _         _  J 

subject  to    R  =  Q*HP 

ReR^^^,rij  =  0  for  i>  j  (3.11) 
rii>0  for  l<i<  K 
Q*Q  =  P'P  =  Ik 

where  the  semi-unitary  matrices  Q  and  P  denote  the  linear  operations  at  the  receiver 
and  transmitter,  respectively. 

Since  both  Q  and  P  are  semi-unitary  matrices,  we  have  n^=i  ^nn  =  Yln^i  ^H,n, 
where  {XH,n}^=i  are  the  K  non-zero  singular  values  of  H.  In  Chapter  6  we  show  that  if 
there  exist  semi-unitary  matrices  P  and  Q  satisfying 

H  =  QRP*,       or  equivalently,       R  =  Q*HP  (3.12a) 
where  the  diagonal  elements  of  R  are  given  by 

ru  =  Ah  =  f  n  Xh.uJ      ,    l<i<K,  (3.12b) 

then  the  R  in  (3.12)  is  the  solution  to  (3.11).  The  detailed  treatment  of  the  decompo- 
sition (3.12)  is  delegated  to  Chapter  6,  We  refer  to  this  decomposition  as  the  geometric 
mean  decomposition  (GMD)  since  the  diagonal  elements  of  R  are  the  geometric  mean 
of  {>^iin}n=i-  A  computationally  efficient  and  numerically  stable  algorithm  is  proposed 
in  Section  6.2  to  calculate  the  decomposition. 

It  seems  reasonable  to  constrain  the  linear  equalizer  Q  to  be  semi-unitary  since  it 
will  keep  the  background  noise  white.  Yet  it  seems  unnecessary  to  constrain  P  to  be 
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semi-unitary  as  well.  Indeed,  the  constraint  that  P  and  Q  should  be  semi-unitary  is  in 
fact  inactive  as  shown  in  the  following  lemma  established  in  Section  6.2.1. 
Lemma  3.2.1  The  GMD  of  (3.12)  is  also  the  solution  to  the  following  optimization 
problem  with  relaxed  constraints: 

m^  min  {ru  :  I  <  i  <  K} 

subject  to    R  =  Q*HP,  r^  =  0  for  i  >  j,  R  G  R^^-^, 

'  (3.13) 

>  0,  l<i<K, 
tr(Q*Q)  <  K,  tr(P*P)  <  K. 

Proof:   Omitted.  See  Section  6.2.1  for  details.  ■ 
The  GMD,  which  can  be  viewed  as  an  extended  QR  decomposition,  can  be  read- 
ily combined  with  the  aforementioned  VBLAST  (GDFE)  or  ZFDP.  GMD-VBLAST  is 
implemented  as  follows:  We  first  calculate  the  GMD  H  =  QRP*.  Next  we  choose  the 
precoder  F  =  P,  then  the  equivalent  data  model  is 

y  =  QRx  +  z.  (3.14) 

The  next  step  is  nothing  but  the  VBLAST  detector. 

Ignoring  the  error  propagation  effect,  we  can  regard  the  resulting  subchannels  as  K 
independent  and  identical  subchannels 

Vi  =  A//Xi  +  Zi,    for   i  =  1, . . . ,  K.  (3.15) 

The  GMD-ZFDP  scheme  is  similar  to  GMD-VBLAST  because  of  the  duality  be- 
tween VBLAST  and  ZFDP. 

3.3    Performance  Analyses  and  Implementations  Issues 

In  this  section,  we  first  present  the  performance  analyses  of  the  GMD  scheme  from 
capacity  perspective,  from  which  we  demonstrate  the  advantages  of  our  GMD  scheme 
over  the  linear  transceivers.  Next,  we  consider  combining  the  GMD  scheme  with  the 
blind  two-way  channel  subspace  tracking  in  the  TDD  scenario.  To  achieve  close  to  opti- 
mal performance  at  low  SNR,  we  propose  to  combine  GA-ID  with  subchannel  selection. 
Finally,  we  discuss  the  relationship  between  our  GMD  scheme  and  [30]. 
3.3.1    Performance  Analyses 

As  we  have  mentioned  earlier,  the  overall  BER  performance  of  a  MIMO  commu- 
nication system  is  dominated  by  the  worst  subchannels  asymptotically  for  high  SNR. 
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Hence  the  scheme  optimizing  the  worst  subchannel  can  enjoy  the  optimal  BER  per- 
formance for  high  SNR.  This  observation  is  also  the  motivation  of  the  aforementioned 
MMD  scheme.  As  a  major  advantage  over  the  linear  transceiver  schemes,  the  GMD 
scheme  is  also  asymptotically  optimal  in  terms  of  the  channel  capacity  for  high  SNR  as 
we  will  show  below. 

If  the  signal  power  is  allocated  evenly  to  the  K  subchannels,  then  based  on  (3.15), 
we  get 

CGMD  =  Klog^(l  +  -^~Xiy  (3.16) 

where  p  is  defined  in  (2.2).  The  channel  capacity  with  uniform  power  loading  on  the  K 
subchannels  is  (see  (2.50)) 

K 

CuPL  =  Y.^og,(l  +  ^Xl„).  (3.17) 

n=l 

It  follows  from  (3.16)  and  (3.17)  that 

^UPL  —  ^GMD  =  iOg2   Z   X  •  (J.lOj 

(1  +  pXjj) 

From  (3.12b)  and  (3.18),  we  have 

lim  CupL  —  Cgmd  =  0 

Based  on  Lemma  2.1.1 

lim  CjT  -  CupL  —  0. 

p—*oo 

Hence,  it  follows  from  (3.19)  and  (3.20)  that 

lim  CiT  -  Cgmd  =  0,  (3.21) 

p—toc 

i.e.,  for  high  SNR  the  GMD  scheme  is  asymptotically  optimal. 

Hence  the  GMD  scheme  does  not  need  to  make  the  tradeoff  between  the  information 
rate  and  BER  performance  as  the  conventional  linear  transceivers.  Instead,  our  GMD 
scheme  can  achieve  the  optimum  on  both  aspects  simultaneously  for  high  SNR. 

As  we  have  mentioned  before,  VBLAST  may  suffer  from  error  propagation.  Hence 
the  BER  performance  of  GMD- VBLAST  will  be  inferior  to  the  scalar  equivalence  in 
(3.15).  We  calculate  the  upper  bound  of  the  GMD- VBLAST  BER  as  follows.  For  a 
fixed  SNR  p,  we  assume  that  the  system  of  (3.15)  has  symbol  error  rate  (SER)  P^,  i.e., 
each  subchannel  has  SER  Pe/K.  We  consider  the  worst  case  that  decoding  errors  in  some 


(3.19) 
(3.20) 
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subchannels  will  cause  the  failure  of  the  decoding  in  all  the  subsequent  subchannels.  The 
SER  upper  bound  is  readily  calculated  as 

1  ^"^ 

Pe,GMD-VBLAST     =  ^2^^  ~  PeT{K  —  n)Pe 

n=0 

1  ^"^ 
n=0 

=   ^Pe.  (3.22) 

For  a  moderate  K,  say  K  <  10,  the  performance  loss  caused  by  the  error  propagation  is 
rather  small.  For  a  system  with  high  dimensionality,  GMD-ZFDP  is  a  better  choice  since 
it  causes  no  error  propagation.  On  the  other  hand,  the  Tomlinson-Harashima  precoder 
leads  to  an  input  power  increase  of  -j^^  for  M-QAM. 

3.3.2    Combination  of  GMD  with  Two-way  Channel  Subspace  Tracking 

In  TDD  systems,  the  GMD  scheme  may  be  combined  with  two-way  channel  sub- 
space  tracking  techniques.  The  GMD  algorithm,  given  in  Chapter  6,  starts  with  the 
SVD.  To  calculate  the  matrix  P  (cf.  (3.11)),  we  only  need  to  know  the  singular  values 
A  and  the  right  singular  vectors  V  (cf.  Chapter  6).  Similarly,  only  A  and  U  are  used 
to  calculate  Q.  Rewriting  (2.1)  with  the  precoder  F  =  P  yields, 

y  =  HPx  +  z.  (3.23) 

Since  the  GMD  scheme  uses  the  same  signal  constellation  and  uniform  power  allocation, 
the  covariance  matrix  of  s  is  a  scaled  identity  matrix,  i.e.,  E{x.x*]  =  all.  Hence, 

Ry  =  E[yy*]^KWal  +  a%  (3.24) 

If  the  signal  power  and  the  noise  power  are  known  a  priori,  we  have  HH*  — 
[Ry  -  all)/al.  Applying  SVD  to  HH*,  we  get 

HH*  =  UA^U*.  (3.25) 

The  GMD  algorithm  can  be  applied  based  on  U  and  A  to  get  the  matrices  Q  and  R, 
which  are  sufficient  for  decoding.  If  a  TDD  system  is  used,  the  reverse  channel,  where 
the  roles  of  previous  transmitter  and  receiver  are  exchanged,  can  be  modeled  as 

^rev        H    Q  ^rev  ~^  '^rev  (3.26) 
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where  the  subscript  "rev"  means  "reverse  channel" .  Define 

Ry„.  =  EifrevyL]  (3.27) 

where  y  denotes  the  complex  conjugate  of  y.  Using  the  similar  argument,  we  have 

H*H  =  VA^V*.  (3.28) 

Then  the  reverse  receiver,  i.e.,  the  previous  transmitter,  can  calculate  R  and  P  from  V 
and  A.  Channel  subspace  tracking  techniques  (see,  e.g.,  [45]  [46])  can  be  used  to  estimate 
U,  V  and  A  efficiently.  Hence  our  GMD  scheme  can  be  applied  without  the  need  of 
using  training  symbols  for  channel  estimation.  We  note  that  this  merit  of  GMD  is  not 
shared  by  the  conventional  transceiver  schemes  introduced  in  Section  2.3  since  all  those 
methods  allocate  different  powers  to  different  subchannels,  which  makes  it  difficult,  if 
not  impossible,  to  estimate  the  singular  values  in  A.  Of  course,  if  the  same  power  is 
allocated  to  each  eigen-subchannel,  this  blind  two-way  channel  subspace  tracking  idea 
can  also  be  combined  with  the  SVD  based  schemes,  at  the  cost  of  significant  capacity 
loss. 

The  GMD  scheme  can  be  made  backward  compatible  with  the  TDD  systems  using 
VBLAST  decoders.  By  using  CSIT  or  blind  subspace  tracking  techniques,  the  trans- 
mitter can  calculate  the  linear  precoder  F.  Hence  it  can  always  precede  the  transmitted 
data  X  to  be  Px,  even  when  sending  the  training  data.  Thus  the  receiver  is  "fooled" 
to  believe  that  the  channel  is  the  virtual  one  H^f  =  HP  =  QR.  Although  the  linear 
precoder  P  is  made  transparent  to  the  VBLAST  detector,  the  decoder  still  enjoys  the 
multiple  identical  subchannels  due  to  the  linear  precoder  F  =  P. 
3.3.3    Subchannel  Selection 

The  previous  discussion  is  based  on  the  assumption  that  all  the  subchannels  cor- 
responding to  positive  singular  values  are  used  for  signal  transmission.  However,  in 
practical  scenarios,  some  of  the  positive  singular  values  of  the  channel  matrix  H  can  be 
very  small.  This  situation  occurs  for  spatially  correlated  flat  fading  channels,  or  even 
i.i.d.  Rayleigh  flat  fading  channels  with  ~  Mt  >  1.  From  (3.12b),  we  see  that 
it  will  influence  the  overall  channel  quality  and  hence  subchannel  selection  is  helpful. 
The  other  situation  where  subchannel  selection  is  needed  is  the  case  when  the  input 
power  is  low  or  moderate.  In  this  section,  we  propose  a  simple  algorithm  to  select  the 
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subchannels,  which  is  numerically  verified  to  be  able  to  achieve  near  optimal  capacity 
even  at  low  SNR. 

Let  us  sort  the  singular  values  of  H  as  \h,i  >  \h.2  ■  •  •  >  ^h,k  >  0.  If  GMD  is 
constrained  to  the  first  n  <  K  eigen  subchannels,  we  obtain  n  identical  subchannels 

Ui  =  XnXi  +  Zi,    for    i  =  1, . . . ,  n.  (3.29) 

where 

1  n 

\  i=l 

To  maximize  the  channel  throughput  with  our  GMD  scheme,  we  need  to  solve  the 
following  problem 


max  n  log  I  1  -|- 


or 


max  I  1  + 

l<n<K 


n^'hz:]  ■  (3.32) 


i=l 


The  solution  to  this  problem  is  straightforward.  We  can  use  either  linear  search  or 
bisection  method  to  find  the  optimal  n. 

Several  remarks  are  in  order,  i)  It  is  straightforward  to  incorporate  the  channel 
selection  into  the  GMD  algorithm.  In  Section  6.2.2,  we  show  that  GMD  starts  from 
SVD  H  =  UAV*  and  then  applies  a  series  of  Givens  transformation  to  A  to  make  it 
upper  triangular.  The  Givens  transformation  can  be  constrained  to  the  first  n  <  K 
diagonal  elements  of  A.  ii)  The  blind  channel  subspace  tracking  can  be  combined  with 
the  subchannel  selection  strategy  seamlessly.  If  only  the  subchannels  corresponding  to 
the  largest  n<  K  singular  values  are  selected,  the  blind  channel  tracking  technique  will 
track  the  n  dimensional  subspace  automatically,  iii)  The  performance  loss  of  the  GMD 
scheme  at  low  SNR  region  is  due  to  the  well-known  fact  that  the  zero- forcing  equalizer 
is  inherently  suboptimal.  In  the  next  chapter,  we  propose  the  so-called  uniform  channel 
decomposition  (UCD)  scheme,  which  can  decompose  a  MIMO  channel  into  multiple 
identical  subchannels  in  a  strictly  capacity  lossless  manner. 
3.3.4    Further  Remarks 

The  author  later  noticed  [30]  in  which  an  idea  similar  to  GMD  was  proposed  to 
approach  the  performance  of  the  ML  detector  in  the  ISI  suppression  scenario.  For  a 
SISO  ISI  channel  if  symbols  are  precoded  and  transmitted  in  a  block  manner,  then  the 
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data  model  (2.39)  can  be  used  to  represent  the  received  block  data  (cf.  (2.3)  and  (2.4)). 
Note  that  for  this  case,  H  is  a  Toeplitz  matrix  due  to  the  time  invariant  property  of 
the  ISI  channel.  A  linear  precoder  design  F  was  proposed  in  [30]  such  that  the  virtual 
channel  H„t  =  HF  can  be  decomposed  via  QR  decomposition  to  be  H^t  =  QR  where 
R  has  equal  diagonal  elements.  We  see  that  this  equal  diagonal  idea  is  equivalent  to 
GMD.  However,  our  GMD  scheme,  independently  motivated  by  the  MIMO  transceiver 
design  problem,  has  several  major  advantages  over  the  algorithm  in  [30]: 

1.  Our  GMD  scheme  represents  a  paradigm  shift  from  the  conventional  linear  trans- 
ceiver designs  to  nonlinear  designs  and  can  be  proven,  both  numerically  and  theo- 
retically, to  have  superior  performance  from  both  BER  and  information  theoretic 
aspects. 

2.  Our  GMD  algorithm  is  computationally  much  more  efficient  than  that  of  [30]. 
Both  algorithms  start  from  the  SVD  of  H  which  is  followed  by  K  -  1  iterations. 
The  GMD  involves  2K  -  2  fast  Givens  rotations.  For  a  channel  H  with  Mt  = 
Mr  =  K,  the  SVD  requires  0{K^)  flops  while  the  GMD  requires  additional  0{K^) 
flops.  Thus  the  computational  complexity  of  the  GMD  scheme  is  comparable 
to  the  conventional  linear  transceiver  schemes.  However,  the  algorithm  in  [30] 
involves  multiplications  and  inversions  of  matrices  in  each  iteration  and  the  overall 
computational  burden  turns  out  to  be  additional  0(/i'*)  flops. 

3.  For  the  GMD  algorithm,  only  the  information  of  HH*,  and  hence  A  and  U,  are 
needed  to  calculate  Q.  However,  for  the  algorithm  in  [30],  the  equalizer  needs 
to  know  both  the  precoder  F  and  H,  and  hence  H^t  =  HF,  in  order  to  apply 
the  traditional  QR  to  H„f.  Hence  it  cannot  be  combined  with  the  aforementioned 
bhnd  two-way  channel  subspace  tracking  algorithm  introduced  in  Section  3.3.2. 

Like  the  algorithm  in  [30],  the  GMD  scheme  can  also  be  combined  with  orthogonal 
frequency  division  multiplexing  (OFDM)  for  ISI  suppression.  For  a  SISO  ISI  channel 
with  memory  L, 

L-l 

yin)=-^hix{n-l)  +  z{n),  (3.33) 
after  applying  OFDM  with  block  length  N,  we  get  a  MIMO  channel 

y  =  Dx  +  z  (3.34) 

where  D  is  a  diagonal  matrix  with  the  diagonal  elements  equal  to  the  A/'-point  FFT  of 
h  =  [/jo,/ii,  /jl-i]^-  Hence  the  GMD  scheme  can  be  applied  directly.  We  expect 
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that  GMD-ZFDP  may  have  better  BER  performance  than  GMD-VBLAST  if  iV  »  1,  in 
which  case  the  GMD-VBLAST  may  suffer  from  considerable  performance  degradation 
due  to  error  propagations. 

3.4    Performance  Examples 

We  present  next  several  numerical  examples  to  demonstrate  the  effectiveness  of  the 
GMD  scheme.  In  all  the  examples,  we  assume  Rayleigh  independent  flat  fading  channels. 

In  the  first  example,  we  consider  a  Rayleigh  flat  fading  channel  with  Mt  =  A  and 
Mr  =  4.  We  compute  the  Shannon  capacities  of  the  channel  with  both  CSIR  and  CSIT 
(C/r,  (2.10)),  the  channel  with  uninformed  transmitter  {Cut,  (2-11)),  the  channel  using 
the  GMD  scheme  {Cgmd,  (3.16)),  the  channel  using  the  MTM  scheme  {Cmtm,  (2.49)), 
and  the  channel  using  the  MMD  scheme  [Cmmd,  (2.55)).  We  average  the  capacities  of 
1000  Monte-Carlo-generated  H  realizations.  The  result  is  presented  in  Figure  3-1.  We 
note  that  the  capacity  loss  of  the  MMD  scheme  is  about  twice  that  of  the  MTM  scheme 
at  high  SNR  as  predicted  in  Section  2.3.  The  relative  capacity  loss  of  the  MMD  scheme 
compared  with  MTM  is  smaller  at  low  SNR  because  some  subchannels  are  not  used  at 
low  SNR.  The  GMD  scheme  outperforms  the  linear  transceiver  designs  when  the  SNR 
is  moderate  or  high  and  is  asymptotically  capacity  lossless  at  high  SNR. 

Figure  3-2  shows  the  complementary  cumulative  distribution  functions  (CCDF)  of 
the  channel  capacities  of  a  5  x  5  independent  Rayleigh  flat  fading  channel  with  SNR 
equal  to  23  dB.  The  five  thin  dashed  curves  denote  the  channel  capacities  of  the  five  sub- 
channels obtained  via  SVD  plus  water  filling.  Note  that  the  leftmost  thin  curve  crosses 
the  vertical  axis  at  a  value  less  than  one,  which  means  that  the  worst  subchannel  (cor- 
responding to  the  smallest  singular  value  of  the  channel  matrix)  is  sometimes  discarded 
by  water  filling.  The  thick  line  is  the  CCDF  of  each  subchannel  capacity  obtained  via 
GMD.  Figure  3-2  further  illustrates  the  disadvantages  of  the  conventional  "SVD  plus  bit 
allocation"  scheme  (see,  e.g.,  [19]  [20]  [23]).  The  channel  capacities  of  the  5  subchannels 
obtained  via  SVD  plus  water  filling  range  from  0  to  about  10  bps/Hz,  which  suggests 
that  the  BPSK  or  QPSK  modulation  should  be  used  to  match  the  capacity  of  the  worst 
subchannel  and  something  like  512  or  1024  QAM  to  the  best  subchannel.  This  bit 
allocation  significantly  increases  the  modulation/demodulation  complexity.  Moreover, 
using  a  constellation  with  size  greater  than  256  is  impractical  for  the  current  RF  circuit 
design  technology.  For  the  GMD  scheme,  on  the  other  hand,  the  same  constellation  with 
a  moderate  size,  say  64-QAM,  can  be  applied  to  reap  most  of  the  channel  capacity 
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To  demonstrate  the  effectiveness  of  the  subchannel  selection  approach,  we  consider 
a  10  X 10  independent  Rayleigh  flat  fading  channel.  The  channel  is  usually  ill-conditioned 
since  some  singular  values  of  H  are  very  close  to  zero.  Without  the  subchannel  selection 
strategy,  GMD  suffers  from  performance  degradation,  especially  at  low  SNR,  as  seen 
in  Figure  3-3.  On  the  other  hand,  with  the  subchannel  selection  scheme,  there  is  only 
about  0.2  bit/sec/Hz  rate  loss  compared  with  the  Cjt,  even  at  very  low  SNR. 

We  compare  the  BER  performance  of  the  GMD-VBLAST  scheme  with  the  unpre- 
coded  MMSE-VBLAST  scheme  with  the  optimal  detection  ordering,  the  MTM  scheme 
and  the  MMD  scheme.  No  error  correcting  code  is  used  in  the  simulations.  In  Fig- 
ure 3-4(a),  H  €  C'*'^^  has  identically  independent  Rayleigh  fading  elements.  Hence  the 
channel  matrix  is  usually  well-conditioned.  Two  independent  symbol  streams  modulated 
as  16-QAM  are  transmitted.  The  figure  is  obtained  by  averaging  1000  Monte  Carlo  tri- 
als of  H.  We  see  that  the  GMD  scheme  has  more  than  one  dB  improvement  over  the 
MMD  scheme  at  moderate  to  high  SNR.  In  Figure  3-4(b),  H  €  C^""^  usually  has  a  large 
condition  number,  in  which  case  the  MMD  scheme  is  subject  to  more  capacity  loss  as 
analyzed  in  Section  2.3.  Four  independent  symbol  streams  are  transmitted.  The  BER 
performance  of  the  GMD  scheme  is  much  better  than  the  others.  We  did  not  include 
MTM  because  it  discards  some  bad  subchannels  and  hence  cannot  be  used  to  transmit 
four  independent  data  streams. 

In  the  final  example,  we  combine  the  GMD  scheme  with  64-point  FFT  based 
OFDM  for  ISI  suppression  in  a  SISO  channel.  We  assume  that  the  channel  response 
/i;,/  =  0,l,...,L— 1,  are  independent  zero- mean  circularly  symmetric  Gaussian  random 
variables  with  unit  variance.  The  channel  length  is  L  =  4.  The  GMD-ZFDP  is  about  2 
dB  better  than  GMD-VBLAST.  This  is  because  GMD-VBLAST  suffers  from  consider- 
able error  propagation  effect.  This  result  suggests  that  GMD-ZFDP  may  be  preferred 
over  GMD-VBLAST  if  the  channel  has  a  large  dimensionality. 

3.5  Conclusions 

In  this  chapter,  we  introduce  a  novel  joint  transceiver  design,  which  combining 
the  geometric  mean  decomposition  (GMD)  with  the  VBLAST  equalizer  or  dirty  paper 
precoder.  The  GMD  scheme  can  decompose  a  MIMO  channel  into  multiple  identical 
scalar  subchannels.  This  desirable  property  can  bring  about  much  convenience  to  the 
practical  system  design,  particularly  the  symbol  constellation  selection.  Moreover,  we 
have  shown  that  the  GMD  scheme  is  optimal  asymptotically  for  high  SNR  in  terms  of 
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Figure  3-1:  Average  capacity  over  1000  Monte  Carlo  trials  vs.  SNR  with  =  4  and 
Mr  =  4  for  i.i.d.  Rayleigh  flat  fading  channels. 


both  information  rate  and  BER  performance  while  the  computational  complexity  of  our 
scheme  is  comparable  with  the  conventional  linear  transceiver  scheme.  Furthermore,  we 
have  shown  that  the  GMD  scheme  can  be  applied  without  the  need  of  using  training 
symbols  for  channel  estimation  if  combined  with  subspace  tracking  techniques.  We 
have  also  considered  the  issue  of  subchannel  selection  when  some  of  the  subchannels 
are  too  poor  to  be  useful.  The  GMD  scheme  can  also  be  combined  with  OFDM  for  ISI 
suppression.  Both  the  theoretical  analyses  and  empirical  simulations  have  been  provided 
to  validate  the  effectiveness  of  our  approaches. 
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Figure  3-2:  Complementary  cumulative  distribution  functions  of  the  capacities  of  5 
subchannels  of  the  i.i.d.  Rayleigh  flat  fading  channel  with  Mt  =  5  and  Mr  =  5.  Results 
based  on  2000  Monte  Carlo  trials. 
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Figure  3-3:  Complementary  cumulative  distribution  function  of  the  capacity  of  an  i.i.d. 
Rayleigh  flat  fading  channel  with  M«  =  10  and  Mr  =  10.  Results  based  on  1000  Monte 
Carlo  trials.  SNR  =  (a)  0  dB,  (b)  10  dB,  (c)  20  dB,  and  (d)  30  dB. 


Fi  gure  3-4:  BER  performance  averaged  over  1000  Monte  Carlo  trials  of  i.i.d.  Rayleigh 
flat  fading  channel  vs.  SNR  with  (a)  A/,  =  2  and  Mr  =  4  and  (b)  Mt  =  4  and  Mr  =  4. 
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GMD-i-OFDM,  N  =  64,  L  =  4,  64-QAM 
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Figure  3-5:  BER  performances  of  GMD-VBLAST  and  GMD-ZFDP.  Both  are  combined 
with  OFDM  for  ISI  suppression. 


CHAPTER  4 
UNIFORM  CHANNEL  DECOMPOSITION 

We  have  seen  in  Chapter  3  that  the  GMD  scheme  can  have  much  better  perfor- 
mance than  the  conventional  linear  transceivers.  However,  the  GMD  scheme  may  suffer 
from  considerable  capacity  loss  at  low  SNR  due  to  the  inherent  "zero-forcing"  oper- 
ations which  is  capacity  lossy,  especially  at  low  SNR.  In  this  chapter,  we  propose  a 
uniform  channel  decomposition  (UCD)  scheme,  which  is  also  based  on  the  GMD  matrix 
decomposition  algorithm,  to  decompose  a  MIMO  channel  into  multiple  identical  sub- 
channels. The  UCD  scheme  has  two  implementation  forms.  One  is  the  combination 
of  a  linear  precoder  and  a  minimum  mean-squared-error  VBLAST  (MMSE-VBLAST) 
detector,  which  is  referred  to  as  UCD- VBLAST,  and  the  other  includes  a  dirty  paper 
(DP)  precoder  and  a  linear  equalizer  followed  by  a  DP  decoder,  which  we  refer  to  as 
UCD-DP.  Just  like  the  GMD  scheme,  UCD  can  bring  much  convenience  to  the  subse- 
quent modulation/demodulation  and  coding/decoding  procedures  by  obviating  the  need 
of  bit  allocation.  Two  remarkable  merits  of  UCD,  which  are  not  shared  by  the  GMD 
scheme,  are  that  first,  UCD  is  sinci/y  capacity  lossless  at  any  SNR,  and  second,  UCD  has 
the  maximal  diversity  gain.  Moreover,  the  UCD  scheme  can  decompose  a  MIMO  chan- 
nel into  an  arbitrarily  large  number  of  independent  subchannels,  which  is  an  enabling 
technology  to  achieve  high  data  rate  transmission  using  small  symbol  constellations. 
To  facilitate  the  discussion,  we  recall  the  channel  model  given  in  (2.1)  as  follows. 

y  =  HFx-|-z,  (4.1) 

where  x  G  C^'^Ms  the  information  symbols  preceded  by  the  linear  precoder  F  e  C^''"^ 
and  y  G  C^^^i  is  the  received  signal  and  H  G  C^ir^^h  jg  the  channel  matrix  with  rank 
K.  We  assume  ^[xx*]  =  a^I^  and  z  -  iV(0,  a^I^J  is  the  circularly  symmetric  complex 
Gaussian  noise.  We  define  the  input  SNR  as 

^[x*F*Fxl     a2  1 
P  =       ^2        =  -|Tr{F*F}  A    Tr{F*F}.  (4.2) 


33 


34 


4.1    Closed-Form  Representation  of  MMSE-VBLAST 

The  UCD  scheme  is  based  on  the  closed-form  representation  of  the  VBLAST  scheme 
using  MMSE  nulHng  vectors.  For  MMSE-VBLAST,  the  nulling  vector  for  the  ith  layer 
is  ^ 

Wi=  ^^h,h;  +  alj    hi,    i  =  l,...,Mt.  (4.3) 

The  MMSE-VBLAST  algorithm  can  be  represented  in  a  concise  matrix  form  which  was 
given  in  [9]  (also  see  the  more  detailed  version  [47]). 
Consider  the  augmented  matrix 


H 


(4.4) 


{Mr  +  Mt)xMt 


Applying  the  QR  decomposition  to  Ha  yields 


Ha  =  Q//„Rf/„  = 


R 


Ha 


(4.5) 


where  R//„  €  C^'=^^^'  is  an  upper  triangular  matrix  with  positive  diagonal  elements  and 
Q«_^  G  C'^rxAU  Notg  ti^at  H  =  Qna^Ha  is  not  the  QR  decomposition  of  H  since  Q^^ 
is  not  unitary.  However,  we  can  readily  obtain  the  nulling  vectors  using  Q^^  and  R//„ 
as  shown  in  the  following  lemma  [47]: 

Lemma  4.1.1  Let  {(iHa,i}i=i  denote  the  columns  of  QJj^  and  {r//^,u}fl'i  the  diagonal 
elements  o/R^^,  where  Qj^,^  and  Rh„  arc  given  in  (4-5).  The  nulling  vectors  of  (4.3) 
satisfy 


'^i=^Hlii^Ha,i^    i  =  1,2,...,  Mt. 


(4.6) 


Then  the  output  signal-to-interfere-and-noise  ratio  (SINR)  of  the  ith  layer  (i.e.,  the 
signal  corresponding  to  hj)  using  Wj  is 

lh*w  IV^ 

'  ^  (4.7) 


Inserting  (4.3)  into  (4.7),  we  can  simplify  (4.7)  via  some  straightforward  calculations  to 
be  (see,  e.g.,  [48]) 


P^  =  h*C-%,    i  =  l,...,AU 


(4.8) 


where  C,  =  Y,T=\  ^A^j  + 
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The  SINRs  given  in  (4.8)  are  related  to  Rh^  as  shown  in  the  following  lemma: 
Lemma  4.1.2  The  diagonal  o/R//^  given  in  (4-5)  and  {pjjl'i  given  in  (4.8)  satisfy 

a(l+A)  =  ^L,«'    i  =  l,2,...,Mt.  (4.9) 

Proof:   See  Appendix  A.  ■ 

An  immediate  corollary  follows. 
Corollary  4.1.3  The  MMSE-VBLAST  detector  is  information  lossless.  That  is, 

Mt 

J^log(l  +  /5i)  =  log|H'Ha-i+I|,  (4.10) 

where  the  right  hand  side  of  (4-10)  is  equal  to  (2.7)  with  F  —  Im,. 
Proof:   Prom  (4.4)  and  (4.5),  we  have 

H*Ha-i  +  I  =  a-iH„*H„  =  a-iR^/R;^„.  (4.11) 

Hence 

Mt 

log  |H'Ha-i  +11  =  5;  log  (a-'rl^,,)  .  (4.12) 
According  to  Lemma  4.1.2, 

Mt 

log|H*Ha-i  +  I|  =  5]log(l+pO- 

i=l 

m 

We  note  that  Corollary  4.1.3  coincides  with  the  findings  in  [48]. 

4.2  UCD-VBLAST 
If  we  modify  the  precoder  F  given  in  (2.8)  to  be 

F  =  y^'/^n'  (4.13) 

where  Q  e  C^*"^  with  L  >  K  (to  avoid  capacity  loss,  we  should  not  choose  L  <  K 
in  general)  and  n*n  =  I,  then  we  see  through  inserting  (4.13)  into  (2.7)  that  the 
F  given  in  (4.13)  is  also  a  precoder  maximizing  the  channel  throughput.  However, 
introducing  brings  much  greater  flexibility  tlian  the  precoder  of  (2.8).  In  the  following, 
we  concentrate  on  how  to  design  i7. 
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Given  the  precoder  of  (4.13),  the  virtual  channel  is 


G  ^  HF  =  UA^i/^n*  ^  UEfi* 


(4.14) 


(4.15) 


where  E  =  A$^/^  is  a  diagonal  matrix  with  diagonal  elements  {(t,}^i.  Let  Go  denote 
the  augmented  matrix 

The  UCD  scheme  is  based  on  the  following  lemma. 
Lemma  4.2.1  For  any  matrix  of  the  form  given  in  (4-15),  we  can  find  a  semi-unitary 
matrix  Q  6  C^^^  such  that  the  QR  decomposition  of  G^  yields  an  upper  triangular 
matrix  with  equal  diagonal  elements. 

Proof:   Rewrite  (4.15)  as 


(4.16) 


where  fio  €  C^'^^  is  a  unitary  matrix  whose  first  K  columns  form  n.  We  further  rewrite 
(4.16)  as 


G„  = 


Im,  0 
0  fio 

We  can  have  the  following  GMD: 


u[e:okx(l-/c)] 


(4.17) 


J4 


U[S :  Oa-x(l-a') 


(4.18) 


where  Rj  G  K^"^  is  an  upper  triangular  matrix  with  equal  diagonal  elements  and 
Qj  G  C(^'-+^)''^  is  semi-unitary  and  Pj  G  C^''^  is  unitary.  Inserting  (4.18)  into  (4.17) 
yields 

'  Ia/,  0 
0  fio 


(4.19) 


Let  fio  =  P*/  and 


Q 


Ga 


0  r25 


(4.20) 


Then  (4.19)  can  be  rewritten  to  be  G^  =  Qg,Rj  which  is  the  QR  decomposition  of  G^. 
The  semi-unitary  matrix  VI  associated  with  Ga  consists  of  the  first  K  columns  of  fio 
P})- 


or 
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From  Lemma  4.2.1  and  Lemma  4.1.2,  we  conclude  that  we  can  always  combine  a 
linear  precoder  and  the  MMSE-VBLAST  detector  to  uniformly  decompose  a  MIMO 
channel  into  L  >  K  subchannels  with  the  same  output  SINRs.  According  to  Corollary 
4.1.3,  we  can  further  conclude  that  the  channel  decomposition  is  strictly  capacity  lossless. 
We  refer  to  the  scheme  demonstrated  in  Lemma  4.2.1  as  UCD-VBLAST. 

The  proof  of  Lemma  4.2.1  is  insightful.  Indeed,  given  the  SVD  of  H  and  the 
"water  filling"  level  we  only  need  to  calculate  the  GMD  given  in  (4.18).  Then  we 
immediately  obtain  the  linear  precoder  F  =  V$^/^n*,  where  consists  of  the  first  K 
columns  of  P}.  Let  Q^^  denote  the  first  Mr  rows  of  Qg^,  or  equivalently  the  first  Mr 
rows  of  Qj  (cf.  (4.20)).  According  to  Lemma  4.1.1,  the  nulling  vectors  are  calculated  as 


w.  =  rjj.qG„,i,  i=l,2,...,L 


(4.21) 


where  rj^u  is  the  zth  diagonal  element  of  Rj  and  qG„,i  is  the  ith  column  of  . 

Some  observations  can  help  reduce  the  computational  complexity.  For  any  matrix 
B  G  C^^^  with  SVD  B  =  UbAbV^  and  the  augmented  matrix  with  SVD 


A  = 


B 


U^A^V', 


(4.22) 


the  diagonal  elements  of  A^  and  As,  i.e.,  X^^i  and  Xg^i,  satisfy 


Moreover 


=  I     ^       \   \     and  =  Vb. 


(4.23) 


(4.24) 


Hence  the  SVD  of  J  defined  in  (4.18)  is 


J  = 


u[s;okx(wo]s-i 


SI, 


(4.25) 


where  S  is  an  L  x  L  diagonal  matrix  with  the  diagonal  elements 


and 


erf  +  «,     1  <    <  K, 


Va,    K  +1  <i  <  L. 


(4.26a) 


(4.26b) 
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Applying  the  GMD  matrix  decomposition  algorithm  given  in  Section  6.2  to  S  yields 
S  =  (Q1Q2 . . .  Q^_i)R,(Pl_iPl_2  . . .  P[).  (4.27) 

Hence 


U[S;0;Cx(L-K)]S-' 


(QiQ2...Ql-i)Rj(pI_,pL2  •••?[)• 

(4.28) 


Then  the  linear  precoder  has  the  form: 


F  =  V 


P1P2  Pl-1. 


(4.29) 


The  nulling  vectors  are  calculated  according  to  (4.21)  with  r jn  —  ( J^.^j  Oi  I     ,  and 

=  U[E  ;  0;,,(z,-/c)]SQiQ2  . . .  (4.30) 

Note  that  Q;  and  P;,  Z  =  1, 2, . . . ,  L,  are  Givens  rotation  matrices  and  hence  calcu- 
lating (4.29)  and  (4.30)  needs  0{Mt{K  +  L))  and  0{Mr{K  +  L))  flops,  respectively. 
We  summarize  the  UCD-VBLAST  scheme  as  follows  ^ 

Table  4-1:  The  UCD-VBLAST  scheme 


step 

operation 

flops 

1 

Compute  SVD  H  =  UAV* 

0{MtMrK) 

2 

Calculate  ^^'^  using  (2.9) 

0{K^) 

3 

S  =  A*i/^ 

0{K) 

4 

Obtain  S  using  (4.26) 

0{K) 

5 

Apply  GUD  to  E  to  obtain  (4.27) 

0{L') 

6 

Generate  F  using  (4.29) 

0{Mt{K  +  L)) 

7 

Compute  Q^^  using  (4.30) 

0{M,{K  +  L)) 

8 

Calculate             using  (4.21) 

0{M,L) 

Obviously,  our  UCD-VBLAST  scheme  has  comparable  computational  complexity 
to  the  SVD  based  linear  transceiver  designs.  An  observation  relevant  to  practical  imple- 
mentations is  as  follows.  Note  that  the  receiver  does  not  have  to  calculate  Step  6  since 
CSIT  is  available  and  the  transmitter  can  run  Steps  1  to  6.  However,  if  the  receiver  cal- 
culates F,  which  only  takes  a  small  number  of  flops,  and  feeds  it  back  to  the  transmitter, 


'  Steps  5-7  can  be  processed  simultaneously  as  in  the  GMD  algorithm. 
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then  the  transmitter  is  reheved  from  calculating  the  SVDs.  Hence  in  FDD  systems,  it  is 
preferable  to  feed  back  F,  rather  than  H,  to  the  transmitter.  In  TDD  systems,  there  are 
still  advantages  for  feeding  back  F  since  this  reduces  by  approximately  half  the  overall 
computational  complexity. 

We  conclude  the  discussions  of  the  UCD-VBLAST  scheme  by  deriving  the  SINR  of 
each  subchannel.  Note  that  the  diagonal  elements  of  Rj  is 

'■^."=in^'l     '    «  =  1-2,...,L,  (4.31) 
which  is  the  geometric  mean  of  the  diagonal  elements  of  S.  It  follows  from  (4.26)  that 

rlu  =  i^^'-^  U(^f  + «)  j      =a  {lli^''^f  +  1)1      •  (4.32) 
According  to  Lemma  4.1.2, 

A  =  P  =  f  n(a"'^^' +  1)  1      -1,    ?  =  1,2,...,L.  (4.33) 

Hence 

L  K  K 

log2(l  +  Pi)  =  Y.  log2(l  +  a"'^^')  =      ^°g2(l  +  <^~'>^H,i<t>i)  (4'34) 

i=l  i=l  i=l 

which  is  exactly  the  C/r  in  (2.10).  Hence  UCD-VBLAST  is  strictly  capacity  lossless. 

4.3  UCD-DP 

As  a  dual  form  of  UCD-VBLAST,  the  UCD  scheme  can  be  implemented  by  using 
DP  precoding,  which  we  refer  to  as  UCD-DP.  For  UCD-DP,  a  direct  construction  of 
the  linear  precoder  F  as  done  in  Section  4.2  is  not  obvious.  Instead,  we  exploit  the 
uplink-downlink  duality  revealed  in  [49]  to  obtain  UCD-DP. 

We  convert  the  UCD-DP  problem  into  the  UCD-VBLAST  problem  in  the  reverse 
channel  where  the  roles  of  the  transmitter  and  receiver  are  exchanged 

y  =  H*x  +  z.  (4.35) 

The  UCD-VBLAST  scheme  can  be  applied  to  the  channel  of  (4.35),  which  yields 
the  precoder  F^e^  and  the  equalizer  {wjf^j  as  in  (4.29)  and  (4.21),  respectively  Nor- 
malize {w,}f^j  to  be  of  unit  Euclidean  norm,  which  we  denote  as  {w,}f^,.  Let  W  = 
[wi, . . . ,  Wi].  According  to  the  uplink-downlink  duality,  the  precoder  of  UCD-DP  should 
be  F  =  V^D,  where  D,  is  diagonal  with  the  diagonal  elements  {^i}iLi,  which  will  be 
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determined  based  on  (4.40)  below.  We  use  Frev,  the  linear  precoder  in  the  reverse 
channel,  as  the  hnear  equalizer.  Then  the  equivalent  MIMO  channel  is 

y  =  F;,„HWD,x  +  F;,„z,  (4.36) 

where  the  ith  scalar  subchannel  of  the  MIMO  channel  is 

L  t-1 

Vi  =  f;Hw,v/g^x,  +        f*nwj^jXj  +      f^Hwj^Xj  +  i*z.  (4.37) 

j=i+l  j=l 

Applying  the  dirty  paper  precoder  to  Xi  and  treating  ^i^^jy/Oj^j  ^  the  interfer- 
ence known  at  the  transmitter  (note  that  here  we  precode  the  first  layer  first  while  for 
UCD-VBLAST,  we  detect  the  Lth  layer  first),  we  obtain  an  equivalent  subchannel 

L 

Vi  =  f'Hwi^iXi  +  J2  CHwj v^x,-  +  f*z  (4.38) 


with  SINR 


Pi  = 


qi\iiUwi 


for  z  =  1,2, ...  ,1. 


(4.39) 


ail 

-pan    ■  ■  ■ 

-pa-XL 

'  l|fi 

1^" 

0 

an 

-pa2L 

92 

=  pa 

I|f2 

12 

0 

0 

aiL 

.  11^^ 

«i|f.il'  +  E;'=,+i9,|f;Hw,.| 

The  next  step  is  to  calculate  {^ij^i  such  that  Pi  =  p,  1  <  i  <  L,  where  p  is  as  defined 
in  (4.33).  Let      =  |f^*Hwj|^.  Then  (4.39)  can  be  represented  in  the  matrix  form 


(4.40) 


It  is  easy  to  see  that  Qi  >  0,  0  <  i  <  L.  It  is  proven  in  [49]  that  J2i=i  It  =  tr(FF*)  = 
tr(F„„F;^^).  That  is,  the  UCD-DP  needs  exactly  the  same  power  as  the  UCD-VBLAST 
to  obtain  L  identical  subchannels  with  SINR  p. 

The  UCD-DP  using  the  Tomlinson-Harashima  precoder  leads  to  an  input  power 
increase  of  j~  for  M-QAM  symbols.  Nevertheless,  for  a  system  with  high  dimension- 
ality and/or  using  large  constellation,  UCD-DP  is  a  better  choice  than  UCD-VBLAST 
since  it  is  free  of  propagation  errors. 

4.4    Performance  Analysis 
4.4.1    Diversity  Gain  Analysis 

An  important  performance  metric  is  diversity  gain,  which  is  defined  as  follows  [16]. 
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Definition  4.4.1  Let  Pe{p)  denote  the  average  error  probability  of  a  scheme  at  SNR  p. 
The  diversity  gain  of  the  scheme  is 

d=-liml^i^.  (4.41) 

p->oc      log  p 

The  diversity  gain  measures  how  fast  the  error  probability  decays  with  SNR.  We  note 
that  diversity  gain  is  usually  discussed  without  assuming  the  availability  of  CSIT.  The 
reason  is  that  diversity  gain  is  a  concept  associated  with  channel  outage,  i.e.,  the  case 
where  the  channel  is  too  poor  to  support  a  target  data  rate.  Using  CSIT,  one  can  adjust 
the  transmission  rate  to  avoid  channel  outage.  However,  if  the  rate  is  fixed,  which  is 
desirable  in  practice,  we  can  also  use  diversity  gain  as  a  performance  measure  of  the 
transceiver  designs.  Based  on  this  observation,  we  analyze  the  diversity  gains  of  the 
UCD  and  GMD  schemes.  The  result  is  summarized  in  the  following  proposition. 
Proposition  4.4.2  Consider  the  i.i.d.  Rayleigh  flat  fading  MIMO  channel  defined  in 
(4.1).  Let  M  =  max(Mt,Mr)  and  m  =  mm{Mt,Mr).  The  diversity  gains  of  the  GMD 
and  the  UCD  schemes  are 

dcMoiM,  m)  =  (M  -  m  +  l)m,    and   dvcoiM,  m)  =  Mm,  (4.42) 

respectively. 

We  have  applied  the  typical  error  event  analysis  (see  [16] [50])  to  obtain  (4.42).  The 
details  are  relegated  to  Appendix  B.  We  see  that  although  UCD  has  a  negligible  coding 
gain  compared  with  the  GMD  scheme  at  high  SNR,  it  has  an  additional  -  m  diversity 
gains  over  GMD.  An  interesting  point  to  make  is  that  water  filling  does  not  help  improve 
diversity  gains.  Hence  at  high  SNR,  water  filling  is  useless  in  both  capacity  and  diversity 
aspects. 

Given  the  fact  that  the  GMD  scheme  is  asymptotically  capacity  lossless  for  high 
SNR,  it  is  rather  surprising  to  see  the  large  diversity  loss  of  GMD  compared  with  UCD. 
We  give  an  intuitive  explanation  as  follows.  Note  that  diversity  gain  is  determined  by 
the  typical  error  events  that  the  MIMO  channel  is  in  deep  fade.  Namely,  the  diversity 
gain  of  a  scheme  depends  on  its  ability  of  dealing  with  bad  channels.  A  deeply  faded 
channel  with  high  input  SNR  is  equivalent  to  a  "normal"  channel  with  low  SNR,  in 
which  scenario  the  GMD  scheme  is  far  less  efficient  than  UCD  as  shown  in  the  numerical 
examples.  Consequently,  the  GMD  has  less  diversity  gain  than  UCD. 
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4.4.2    Further  Remarks 

Besides  the  larger  coding  gain  at  low  SNR  and  an  improved  diversity  gain  at  high 
SNR,  the  UCD  scheme  enjoys  more  flexibility  than  the  GMD  scheme.  For  a  rank 
K  MIMO  channel,  the  GMD  scheme  can  support  no  more  than  K  independent  data 
streams.  However,  the  UCD  scheme  can  decompose  a  rank  K  MIMO  channel  into 
L  >  K  identical  subchannels,  and  L  is  not  even  limited  by  the  dimensionality  of  the 
channel  matrix.  This  property  of  the  UCD  scheme  enables  one  to  achieve  high  data  rate 
transmission  using  small  constellations  as  demonstrated  in  the  numerical  examples. 

The  UCD  scheme  also  suggests  new  ways  of  channel  decomposition  which  are  much 
more  flexible  than  the  conventional  SVD  based  ones.  Indeed,  one  may  chose  the  permu- 
tation matrices  and  Givens  rotations  to  achieve  a  wide  variety  of  channel  decompositions 
with  some  prescribed  SINRs  as  suggested  by  the  generalized  triangular  decomposition 
(GTD)  (See  Chapter  6,  [51]  [52]).  This  idea  is  developed  in  Chapter  5. 

Finally,  we  link  UCD  with  DBLAST  [2],  which  has  been  shown  to  be  able  to  achieve 
the  optimal  tradeoff  between  the  channel  diversity  and  multiplexing  [16].  We  observe 
that  each  diagonal  layer  of  DBLAST  can  be  viewed  as  the  interleaving  of  the  vertical 
layers  of  VBLAST  in  the  space-time  domain  and  each  diagonal  layer  can  be  regarded 
as  a  virtual  subchannel  with  the  same  capacity.  However,  DBLAST  requires  short  and 
powerful  error  correcting  coding  to  make  the  virtual  subchannel  work  as  a  "real"  one. 
This  is  a  major  difficulty  for  the  implementation  of  DBLAST.  In  addition,  DBLAST 
suflfers  from  boundary  wastage.  In  contrast,  our  UCD  scheme,  by  exploiting  CSIT, 
applies  interleaving  (via  the  Givens  rotations  and  permutations)  in  the  space  domain 
only.  This  makes  the  UCD  scheme  free  from  the  boundary  wastage.  Moreover,  the 
UCD  scheme  is  decoupled  from  coding  procedures.  Indeed,  UCD  can  be  concatenated 
with  any  error  correcting  code.  Furthermore,  UCD  makes  it  easier  to  design  the  coding 
scheme  since  UCD  decomposes  a  MIMO  channel  into  multiple  subchannels  with  identical 
capacities.  Thus  in  a  slowly  time  varying  channel,  UCD  is  much  easier  to  implement 
than  DBLAST  despite  their  duality.  This  manifests  clearly  the  values  of  CSIT. 

4.5    Numerical  Examples 

We  present  next  several  numerical  examples  to  demonstrate  the  effectiveness  of  the 
UCD  scheme. 

In  the  first  example,  we  assume  Rayleigh  independent  flat  fading  channels  with 
Mt  =  10  and  M,.  =  10.  We  compare  the  channel  capacity  using  the  UCD  and  GMD 
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schemes.  The  complementary  cumulative  distribution  functions  (CCDF)  of  the  capacity- 
drawn  out  of  2000  Monte-Carlo  realizations  of  H  are  shown  in  Figure  4-1.  We  see  that 
the  UCD  scheme  outperforms  the  GMD  scheme  significantly  at  low  SNR  although  the 
difference  becomes  smaller  at  higher  SNR. 

Figure  4-2  shows  the  CCDFs  of  the  channel  capacities  of  a  5  x  5  independent 
Rayleigh  flat  fading  channel  with  SNR  equal  to  25  dB.  The  five  thin  dashed  curves 
denote  the  channel  capacities  of  the  five  subchannels  obtained  via  SVD  plus  water 
filling.  Note  that  the  leftmost  thin  dashed  curve  crosses  the  vertical  axis  at  a  value 
less  than  one,  which  means  that  the  worst  subchannel  (corresponding  to  the  smallest 
singular  value  of  the  channel  matrix)  is  sometimes  discarded  by  water  filling.  The  thick 
solid  line  is  the  CCDF  of  the  capacity  of  the  L  =  5  subchannels  obtained  via  UCD.  All 
these  subchannels  have  the  same  capacity.  As  discussed  in  Section  4.2,  a  rank  K  MIMO 
channel  can  be  decomposed  into  L  >  K  subchannels.  The  thin  solid  line  represents 
the  case  where  a  MIMO  channel  is  decomposed  into  7  identical  subchannels  using  the 
UCD  scheme.  Figure  4-2  demonstrates  the  advantages  of  our  UCD  scheme  over  the 
conventional  "SVD  plus  bit  allocation"  scheme  (see,  e.g.,  [19]).  The  channel  capacities 
of  the  5  subchannels  obtained  via  SVD  plus  water  fiUing  range  from  0  to  about  11 
bps/Hz,  which  suggests  that  the  BPSK  or  QPSK  modulation  should  be  used  to  match 
the  capacity  of  the  worst  subchannel  and  something  like  1024  or  2048  QAM  to  the  best 
subchannel.   This  bit  allocation  significantly  increases  the  modulation/demodulation 
complexity.  Using  GMD  or  UCD,  we  can  decompose  a  rank  5  MIMO  channel  into  5 
subchannels  and  hence  the  same  constellation  with  a  reasonable  size,  say  128-QAM,  can 
be  used  to  reap  most  of  the  channel  capacity.  The  UCD  scheme  can  do  even  better. 
In  this  example,  after  decomposing  a  MIMO  channel  into  7  subchannels  via  UCD,  we 
can  apply  a  small  to  moderate  constellation,  say  16-QAM  or  64-QAM,  to  achieve  the 
channel  capacity. 

In  the  third  example,  we  assume  Rayleigh  independent  flat  fading  channels  with 
Mt  =  4  and  AU  =  4.  We  compare  the  BER  performance  of  the  GMD  and  UCD 
schemes  along  with  the  conventional  MMSE-VBLAST  with  optimal  detection  ordering 
in  Figure  4-3.  We  see  that  both  GMD  and  UCD  outperform  the  conventional  VBLAST 
detector  significantly.  Moreover,  the  BER  vs.  SNR  lines  of  the  GMD  and  UCD  schemes 
have  much  steeper  decreasing  slopes,  which  means  much  better  diversity  gains,  than  the 
conventional  VBLAST.  The  diversity  gains  of  the  GMD  and  UCD  schemes  are  4  and  16, 
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respectively.  While  there  is  a  noticeably  larger  diversity  gain  for  UCD  compared  with 
GMD  as  shown  in  Figure  4-3,  the  difference  is  not  as  drastic  as  the  theoretical  prediction. 
It  is  because  the  input  SNR  is  not  high  enough  to  validate  the  approximations  made  in 
the  typical  error  event  analyses  (see  Appendix  B). 

In  the  final  example,  we  compare  the  BER  performance  of  UCD-VBLAST  and 
UCD-DP  in  the  scenario  of  a  10  x  10  Rayleigh  flat  fading  channel.  To  present  a  bench- 
mark, we  also  include  UCD-genie  as  the  imaginary  scenario  where  at  each  layer,  a  genie 
would  eliminate  the  influence  of  erroneous  detections  from  the  previous  layers  when  using 
UCD-VBLAST.  Figure  4-4  shows  that  UCD-VBLAST  may  suffer  from  some  small  BER 
degradations  caused  by  error  propagation  (about  0.5  dB  for  BER  =  10"^)  compared  with 
UCD-genie.  The  UCD-DP,  on  the  contrary,  is  free  of  error  propagation  and  hence  has 
BER  performance  very  close  to  that  of  UCD-genie.  The  slight  SNR  loss  of  UCD-DP 
is  mainly  due  to  the  inherent  power-amplification  effect  of  the  Tomlinson-Harashima 
precoder. 

4.6  Conclusions 

Based  on  the  GMD  matrix  decomposition  algorithm  and  the  closed-form  represen- 
tation of  the  MMSE-VBLAST  detector,  we  have  introduced  the  UCD  scheme  for  MIMO 
communications  that  can  decompose  a  MIMO  channel  into  multiple  subchannels  with 
identical  capacities  in  a  capacity  lossless  manner.  We  have  proposed  two  versions  of  the 
UCD  scheme,  i.e.,  UCD-VBLAST  and  UCD-DR  The  UCD  scheme  can  provide  much 
convenience  for  the  subsequent  modulation/demodulation  and  coding/decoding  proce- 
dures due  to  obviating  the  need  of  bit  allocation.  We  have  also  shown  that  UCD  can 
achieve  the  maximal  diversity  gain.  The  simulations  show  that  the  UCD  scheme  has 
excellent  performance  even  without  the  use  of  error  correcting  codes.  The  UCD  scheme 
suggests  a  new  way  of  channel  decomposition  which  enjoys  much  more  flexibility  than 
the  conventional  SVD  based  ones. 


Rewrite  (4.5) 


Appendix  A 
Proof  of  Lemma  4.1.2 


Ha  =  Q^„R^„^     ^f'     R^„.  (4.43) 
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Let  Ha,i  (Hi)  denote  the  submatrix  containing  the  first  i  columns  of  Hq  (H)  and  h^^i 
(hj)  the  ith  column.  Then 


Ha,, 


0 


Hi 

y/ali 

{Mt-i)xMt 


0(i-l)xl 


a 


0 


(A/t-i)xl 


(4.44) 


For  the  QR  decomposition  H^  =  Qh^R^^,  the  geometric  imphcation  of  rH,,ii  is 
the  component  of  ha,,  projected  onto  the  subspace  spanned  by  the  ith  column  of  Q;/^, 
i-e-,  q//„,i-  Note  that  qn^^i  is  orthogonal  to  the  subspace  spanned  by  {q//„,j}^=i  or, 
equivalently,  the  column  space  of  Hai_i.  Hence 


(4.45) 


where      stands  for  the  orthogonal  projection  onto  th  null  space  of  A""".  Therefore 


—  Vi* 


I  -  Ha,j_l  (H*  i_iHa,i_i)    ^  H*  ;_i  ha,i. 


(4.46) 


Inserting  (4.44)  into  (4.46)  yields 


Ha, a 


a  +  h* 


I-Hi_i  (H*_iHi_i+al)"'H*_i 


=   a  +  ah;(Hi_iH*_i  +  al)"'h,. 
Prom  (4.8),  we  see  that 

Hence  r^^  ■  •  =  a(l  +  pi).  The  lemma  is  proven. 


(4.47) 


(4.48) 


Appendix  B 
Proof  of  Proposition  4.4.2 

Without  loss  of  generality,  we  assume  H  G  C^^^"\  each  of  whose  entry  is  of  cir- 
cularly symmetric  Gaussian  distribution  with  zero-mean  and  unit  variance.  Consider 
BPSK  modulation.  The  average  error  probability  of  the  GMD  scheme  is 


E 


Q  (\/2pgmd)]  =  E 


=  E 


Q 


i 


(4.49) 
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where  the  Q-function  is  defined  as 


f       1  -si 
The  diversity  gain  of  the  GMD  scheme  is 

log  P°'''° 

daMD{M,m)  =  -  lim  -f— ^ — .  (4.50) 
p— 00     logp  ^  ' 

For  any  QAM  constellation,  the  average  error  probability  is  similar  to  (4.49)  except  for 
some  constants  before  or  inside  the  Q-function.  Since  we  focus  on  the  high  SNR  region, 
all  these  constants  will  not  affect  the  diversity  gain  defined  in  (4.50). 
At  high  SNR,  the  typical  error  event  is 

£  =  {Xfj  :  Xl  <  p-ij  _  (4.51) 

It  can  be  shown  that  instead  of  calculating  (4.50),  which  involves  complicated  integra- 
tions, we  can  compute  the  following  [50,  Ch.  3]: 

dGMD{M,m)  =  -  lim  i^i^i£l.  (4  52) 

p^oo      logp  ^  ' 

Note  that 

m 

^'/r  =  n^^.  =  |H*H|.  (4.53) 

!=1 

According  to  [53,  Theorem  7.5.3]  (with  straightforward  extensions  from  real-valued  do- 
main to  the  complex- valued  domain). 


Ar  =  |H*H|  =  n,^_^,  (4.54) 


where  gf's  are  independent  Chi-squared  random  variables  with  probability  density 

4^^^)  =  ^  ^  0-  (4.55) 

Now  the  typical  error  event  can  be  written  as 

U      {{9M-m+i}"Li-9M-,n+i<p~''',i  =  l,...,m},  (4.56) 
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where  5„  =  {{ai}5^i  •  YaLi     >  "^}-  Hence 

~  m 

^(^)  =  /  n P(9l,-m+.  <  p-"')da,  ...da^  (4.57) 

•^^o  i=l 

Prom  (4.55),  we  know  that  as  e  — >  0, 

Using  (4.52)  (4.58)  and  (4.57),  we  calculate  the  diversity  gain  as 

log L+  lYi^i  ^iMl2+iv'd^i  ■  ■  ■  d°'m- 
doMD{M,m)   -   -  lim  '''"^   

p— oo  log  /9 

log      p-  Er=,{M-"»+«)«idai . . .  dam. 
=   -  lim    ^-^^^   (4.59) 

p-^oo  log  p 

m 

=   infV(M -m  +  i)ai,  (4.60) 
1=1 

where  iP+  =  £"q  f]{ai  >  0,i  =  1, . . . ,  m}.  To  obtain  (4.60)  from  (4.59),  we  have  used  the 
property  that  the  integral  in  the  numerator  of  (4.59)  is  dominated  by  the  term  with  the 
SNR  exponent  closest  to  zero,  as  p  -+  oo  (see  [16]  for  details).  Here  the  integration  is 
constrained  over  because  the  integration  over  £a  is  dominated  by  the  one  over 
The  reason  is  as  follows.  Suppose  only  a„j,...,a„^  >  0,  j  <  m,  and  the  other  q's, 
Ok,,..., afc„_^ ,  are  negative.  Then 


m 


«=1  t=l 

Let  £+  denote  {{a„.}ti  >  0  :  ELi  "n.  >  m  -  JJJi'  ^kA-  Clearly, 

j  m 

inf  y^{M  -m  +  ni)a„.  >  inf  y^(M  -  m  +  i)ai, 
i=l  ^°  1=1 

which  implies  that  the  integration  over  £"„  is  dominated  by  that  over  Solving  the 
optimization  problem  of  (4.60)  yields 

4MD(M,m)  =  (M-m  +  l)m.  (4.61) 
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Now  we  consider  UCD.  We  observe  that  the  power  allocation  applied  to  each  eigen 
subchannel  is  no  greater  than  p.  Hence  the  overall  channel  throughput  of  UCD  is 

E      (l  +  ^         ^  E      (1  +  '  (4-62) 

i=l  i=l 

where  the  left  term  denotes  the  channel  throughput  associated  with  uniform  power 
allocation.  Applying  UCD,  we  obtain  m  subchannels  with  the  same  SNR: 


111  "I  ui 

i=l  \  i=l 


(4.63) 


The  typical  error  event  is 
It  follows  from  (4.63)  that 


(4.64) 


n  (i + ^^^,0  - 1<  1  j  >  p{^)  >  ^  ^  n  (1 + /^^H..)  - 1  <  1  j  ^  P2{p). 

(4.65) 

It  is  easy  to  see  that 


Hence 


p—oo      log  p 

logP(^) 


im 


oo      log  /9 

log  Pi  (p) 


=  hm 


(4.66) 
(4.67) 


p-^oo      log/9  p-^oo      logp  ' 

which  implies  that  water  filhng  does  not  help  improve  diversity  gain. 

It  follows  from  the  analyses  of  [16]  that  the  UCD  scheme  achieves  the  optimal 
diversity-multiplexing  tradeoff.  In  particular,  when  the  transmission  data  rate  is  fixed, 
disregard  the  increase  of  input  SNR,  the  diversity  gain  is  d,,^{M,m)  =  Mm. 


49 


M_  =  10,     =  10,  SNR  :  0  dB  M,  =  10,     =  10,  SNR  =  10  dB 


(c)  (d) 

Figure  4-1:  Complementary  cumulative  distribution  function  of  the  capacity  of  an  i.i.d. 
Rayleigh  flat  fading  channel  with  Mt  =  10  and  Mr  =  10.  Results  based  on  2000  Monte 
Carlo  trials.  SNR  =  (a)  10  dB,  (b)  10  dB  (c)  20  dB,  and  (d)  30  dB. 
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Figure  4-2:  Complementary  cumulative  distribution  functions  of  the  capacities  of  5 
subchannels  of  an  i.i.d.  Rayleigh  flat  fading  channel  with  Mt  =  5  and  =  5.  Results 
based  on  2000  Monte  Carlo  trials. 


Figure  4-3:  Uncoded  BER  performance  when  using  16-QAM.  Results  based  on  1000 
Monte  Carlo  trials  of  an  i.i.d.  Rayleigh  flat  fading  channel  with  Mt  =  4  and  Mr  =  4. 
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Figure  4-4:  BER  performances  of  the  UCD-DP,  UCD-VBLAST  schemes  and  the  imagi- 
nary UCD-genie  scheme.  Results  based  on  1000  Monte  Carlo  trials  of  an  i.i.d.  Rayleigh 
flat  fading  channel  with  Af^  =  10  and  Mr  =  10. 


CHAPTER  5 
TUNABLE  CHANNEL  DECOMPOSITION 

5.1  Introduction 

All  these  aforementioned  MIMO  transceiver  designs  focus  on  improving  the  commu- 
nication quality  subject  to  power  constraints.  In  this  chapter,  we  tackle  a  new  aspect  of 
the  MIMO  transceiver  design  problem.  We  regard  a  MIMO  transceiver  design  as  a  way 
of  decomposing  a  MIMO  channel  into  multiple  subchannels.  As  we  have  mentioned,  the 
MIMO  channel  decomposition  through  SVD  plus  "water  filling"  lacks  flexibility  despite 
its  optimality  in  terms  of  achieving  the  maximal  overall  channel  capacity.  The  success 
of  UCD  motivates  a  much  more  flexible  channel  decomposition  approach,  namely  the 
tunable  channel  decomposition  (TCD)  scheme,  which  is  the  main  result  of  this  chapter. 
Using  the  recently  developed  generalized  triangular  decomposition  (GTD),  we  propose 
the  TCD  scheme  to  decompose  a  MIMO  channel  into  multiple  subchannels  with  pre- 
scribed capacities  or,  equivalently,  signal-to-interference-and-noise  ratios  (SINK).  The 
main  properties  of  the  TCD  scheme  are  summarized  as  follows: 

L  Given  K  parallel  subchannels  with  capacities  Ci,  C2, . . . ,  C/c,  which  is  obtained 
through  applying  SVD  plus  "water  filling"  to  a  rank  K  MIMO  channel,  TCD  can 
convert  the  K  subchannels  into  L>  K  subchannels  with  capacities  Ri,R2,...,Rl 
if  and  only  if  (Ci, . . . ,  C/^,  0, . . . ,  0)  G  majorizes  {Ri,  R2,...,Rl)'  .  In  partic- 
ular, ^^^1  d  =         R^,  i.e.,  the  TCD  is  capacity  lossless. 

2.  The  TCD  scheme  has  two  implementation  forms.  One  is  the  combination  of  a 
linear  precoder  and  a  minimum  mean-squared-error  VBLAST  (MA^SE-VBLAST) 
detector,  which  is  referred  to  as  TCD- VBLAST,  and  the  other  includes  a  DP 
precoder  and  a  linear  equalizer  followed  by  a  DP  decoder,  which  we  refer  to  as 
TCD-DP. 

3.  Given  the  SVD  of  the  MIMO  channel  matrix,  the  computational  complexity  of 
TCD,  which  is  to  calculate  the  precoder  and  equalizer  matrices,  is  0{KL),  which 
is  computationally  quite  efficient. 


^  The  concept  of  majorization  is  introduced  in  Section  5.2. 
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Almost  originated  at  the  same  time  as  the  research  on  MIMO  transceiver  designs, 
the  optimal  design  of  symbol  synchronous  CDMA  (S-CDMA)  sequences  has  been  un- 
der intensive  study  over  the  past  decade  (see,  e.g.,  [31] [54] [55] [56]).  Although  the  two 
research  topics  have  been  studied  in  an  apparently  independent  manner  in  the  signal 
processing  and  information  theory  communities,  the  CDMA  sequence  design  problem 
can  be  viewed  as  a  special  case  of  the  MIMO  transceiver  design  as  we  have  shown 
in  Section  2.1.1.  Hence  the  TCD  scheme  can  be  applied,  with  little  modifications,  to 
the  design  of  optimal  CDMA  sequences.  Moreover,  the  TCD-VBLAST  and  TCD-DP 
schemes  can  be  applied  to  design  optimal  CDMA  sequences  in  the  uplink  (mobile-to- 
base)  and  downlink  (base-to-mobile)  scenarios,  respectively.  Our  TCD  scheme,  which 
is  independently  motivated  by  the  MIMO  transceiver  design  problem,  turns  out  to  be 
related  to  the  scheme  proposed  in  [56].  The  relationship  is  discussed  in  Section  5.3. 

5.2    Channel  Model  and  Preliminaries 
5.2.1    Channel  Model 

To  facilitate  the  discussion,  we  rewrite  the  channel  model  used  in  the  previous 
chapters. 

y  =  HFx  +  z,  (5.1) 

where  x  €  C^*"'  is  the  information  symbols  precoded  by  the  linear  precoder  F  €  C^"^'^ 
and  y  6  C-^'^'^i  is  the  received  signal  and  H  G  C^^-"'^^'  is  the  channel  matrix  with  rank 
K.  We  assume  E[xx*]  =  alli  and  z  ~  N{0,  allu^  is  the  circularly  symmetric  complex 
Gaussian  noise.  We  define  the  SNR  as 


(5.2) 


5.2.2    Channel  Decomposition 

Denote  the  SVD  of  a  rank  K  channel  H  as  H  =  UAV*,  where  K  '\s  a.  K  x  K 
diagonal  matrix  whose  diagonal  elements  {XH,k]k=\  are  the  nonzero  singular  values  of 
H.  To  maximize  the  channel  capacity  with  respect  to  F  given  the  input  power  constraint 
Tr{FF*}  <  pcrl/al,  one  needs  to  solve 

CiT=       max       log2|I  +  a^^HFF*H*|.  ("5  3) 

The  optimal  linear  precoder  is  (cf.  (2.8)) 


F  =  V$^/^. 


(5.4) 
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Here  L  =  K  and  ^  is  diagonal  whose  kth  {1  <  k  <  K)  diagonal  element  (j)k  determines 
the  power  loaded  to  the  kth  subchannel  and  is  found  via  "water  filling"  to  be 

Mfi)  ^  (^^~       j    '  (5-5) 

with  /i  being  chosen  such  that  Ylk=i  4>k{n)  =  pc^l  and  (a)+  =  max{0,  a).  In  this  case, 
we  obtain  K  subchannels  with  capacities 


Ck  =  log2  (  1  + 


log  ' 


a 


+ 


bps/Hz,    k  =  l,2,...,K.  (5.6) 


Due  to  the  usually  large  dynamic  range  of  singular  values  {A^.fclf^i,  the  SVD  decom- 
poses a  MIMO  channel  into  multiple  parallel  eign-subchannels  with  different  channel 
capacities.  Moreover,  since  the  optimal  power  loading  levels  are  fixed  as  given  in  (5.5), 
the  achievable  MIMO  channel  decomposition  is  rigidly  given  in  (5.6)  and  it  lacks  flexi- 
bility. 

Another  way  of  decomposing  a  MIMO  channel  is  to  use  the  VBLAST  detector  [5]. 
The  VBLAST  scheme  involves  sequential  nulling  and  cancellation  and  it  decomposes 
the  MIMO  channel  into  K  subchannels  (or  layers  as  coined  in  [5]).  By  changing  the 
ordering  of  the  signal  detection,  we  can  get  K\  subchannel  combinations,  each  of  which 
is  capacity  lossless  [48]. 

Theoretically,  more  combinations  of  subchannels  is  possible  via  time  sharing  (see 
[57,  Ch.  14.3]).  Recall  that  every  DBLAST  layer  sends  its  data  substream  across  the  K 
transmitting  antennas,  or  VBLAST  layers,  in  a  time  sharing  manner  [2].  For  example, 
for  a  system  with  Mf  =  2,  the  transmitted  data  are 

Vertical  Layer-I  :  y2   X3         . . . 

(5  7) 

Vertical  Layer-II  :     0    X2        X4   . . . 

Let  a;;  and  z  =  1,  2, . . . ,  denote  the  symbols  transmitted  through  the  DBLAST  layers 
I  and  II,  respectively,  at  time  i.  The  receiver  first  estimates  and  then  estimates  X2 
by  regarding  2/2  as  interference.  The  estimates  of  xi,x2  are  decoded  jointly,  which  form 
the  output  of  the  diagonal  layer  I.  After  subtracting  out  the  effect  of  x,,X2  from  the 
received  data,  we  can  estimate  and  decode  y2,y3,  which  form  the  diagonal  layer  II.  We 
remark  that  DBLAST  can  be  viewed  as  a  combination  of  VBLAST  and  the  time  sharing 
technique,  which  decomposes  the  MIMO  channel  into  multiple  identical  subchannels. 
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However,  time  sharing  can  be  difficult  to  implement  in  practice.  For  instance,  the 
major  difficulty  of  DBLAST  is  the  requirement  of  encoding  the  diagonal  layer  with  short 
and  efficient  error  correction  codes,  which  limits  its  practical  implementation  despite  its 
superb  theoretical  performance  analyzed  in  [16]. 

If  CSIT  is  available,  more  flexible  and  practical  channel  decompositions  can  be 
achieved.  In  Chapter  4,  we  have  proposed  the  UCD  scheme  which  combines  the  geomet- 
ric mean  decomposition  (GMD)  developed  in  Section  6.2  with  either  an  MMSE-VBLAST 
detector  or  a  DP  precoder  to  decompose  the  MIMO  channel  of  (5.1)  into  L>K  iden- 
tical subchannels.  Hence,  the  UCD  scheme  can  achieve  the  theoretical  performance  of 
the  DBLAST  scheme  without  resorting  to  any  error  correcting  coding. 

In  this  chapter,  we  generahze  the  results  of  Chapter  4  and  develop  a  systematic 
channel  decomposition  that  combines  the  recently  proposed  GTD  algorithm  with  either 
an  MMSE-VBLAST  detector  or  a  DP  precoder.  We  show  that  given  K  parallel  subchan- 
nels with  capacities  Cj,  C2, . . . ,  Ck,  which  are  obtained  via  SVD,  TCD  can  convert  the 
K  subchannels  into  L>K  subchannels^  with  capacities  Ri,R2, . . . ,  Rl  if  and  only  if 
iRi,R2, . . . ,       is  majorized  by  (Q, . . . ,  C;,,  0, . . . ,  0)  g       This  scheme  is  particularly 
relevant  to  the  applications  where  independent  data  streams  with  different  qualities-of- 
service  (QoS)  share  the  same  MIMO  channel  [28].  For  example,  video  services  usually 
require  higher  SNRs  than  audio  services.  Decomposing  a  MIMO  channel  into  multi- 
ple subchannels  with  prescribed  capacities  and  transmitting  independent  data  streams 
through  these  subchannels  can  provide  much  convenience  for  resource  allocations. 
5.2.3    Majorization  and  Generalized  TViangular  Decomposition 

We  introduce  several  basic  concepts  and  theorems  of  the  majorization  theory  from 

[58]. 

Definition  1  For  x,  y  G  R",  if 

i  j 

<  l<j<n  (5.8) 


i=l  i=l 


With  equality  holds  for  j  =  n,  where  the  subscript  [z]  denotes  the  ith  largest  element  of  the 
sequence,  we  say  that  x  zs  majorized  by  y  and  denote  x  ^+  y  or,  equivalently,  y 


X. 


^  If  L  <  K  some  eign-subchannels  are  discarded,  which  causes  capacity  loss  Hence 
we  focus  on  the  case  of  L>K.  ^ 
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Definition  2  An  n  x  n  matrix  P  is  doubly  stochastic  if  its  {i,j)th  entry  p^j  >  0  for 
i,j  =  l,...,n,  and  Y^^^^  p,,-  =  1  and  ^J^i  P^j  =  1. 

Theorem  5.2.1  x  ^+ y  if  and  only  if  there  exists  a  doubly  stochastic  matrix  P  such 
that  X  =  Py. 

A  square  matrix  n  is  said  to  be  a  permutation  matrix  if  each  row  and  column  has  a 
single  one,  and  all  the  other  entries  are  zero.  There  are  n!  permutation  matrices  of  size 
n  X  n. 

Theorem  5.2.2  The  permutation  matrices  constitute  the  extreme  points  of  the  set  of 

doubly  stochastic  matrices.  Moreover,  the  set  of  doubly  stochastic  matrices  is  the  convex 

hull  of  the  permutation  matrices. 

It  follows  from  Theorems  5.2.1  and  5.2.2  that  the  set  {x|x  ^+  y}  is  the  convex 

hull  spanned  by  the  n!  points  which  are  the  permutations  of  y. 

As  we  have  mentioned  before,  given  K  parallel  subchannels  with  capacities      C2, . . . ,  C;, 
which  are  obtained  via  SVD,  TCD  can  convert  the  K  subchannels  into  L>K  subchan-  ' 
nels  with  capacities  Rr,R2,...,R,  if  and  only  if  {R,,R,, . . . ,  Rl)  ^+ {C,, . . .  ,CkA  ■  ■  ■  ,0)  e 
K^.  For  example,  for  a  MIMO  channel  H  with  rank  K  =  3,  assume  that  the  capacities 
of  the  3  subchannels  obtained  via  SVD  are  C,  >  C2  >  C3.  liL^K,  then  TCD  can 
decompose  the  MIMO  channel  into  3  subchannels  with  a  rate  vector  r  =  {R,,  R,,  R,)  if 
and  only  if  r  lies  in  the  convex  hull 

\ 


Co 


C2 


( 


C3 
C2 


\     (  r.\ 


J 


C2 

C3  j 


(c. 

C3 


C2 


\  ^2  / 


(5.9) 


Here  Co  stands  for  the  convex  hull  defined  as 


Co{s}  =  {e,x,  +  ...  +  OkxkIx,  €S,e,>Q,e,  +  ...  +  ei,  =  i}.  (5.10) 

In  general,  the  "capacity  region"  is  a  convex  hull  defined  by  K\  vertices  in  a  K- 
dimensional  space.  Since  the  TCD  is  capacity  lossless,  i.e.,  Y^^C,  =  YUR^.  the 
capacity  region  falls  into  a  {K  -  l)-dimensional  hyperplane.  The  grly  area  in  Figure 
5-1  shows  the  convex  hull  of  (5.9)  with  =  3,  =  2,  and  C,  =  1.  In  this  case,  the  6 
vertices  lie  in  the  2-D  plane  {x  :  =  6}.  An  interesting  special  case  is  the  UCD 

scheme  [59],  which  achieves  the  rate  vector  corresponding  to  the  center  of  the  convex 
hull,  i.e.,  r  =  (2,2,2). 
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Capacity  lossles  region  (C,  =  3,     •  2,     =  1) 


2.5 


3 


0 


3 


Figure  5-1:  Illustration  of  the  capacity  lossless  region  obtainable  via  TCD.  We  assume 
K  =  3,  Ci  =  3,  C2  =  2,  and  C3  =  1. 

Definition  3  Forx.y  €  Rl,  if 


with  equality  for  j  =  n,  we  say  that  x  is  multiplicatively  majorized  by  y  and  write 
X  -<x  y  or,  equivalently,  y  x. 

Obviously,  if  x  -<y  y,  then  logx  -<+  logy. 

Now  we  are  ready  to  introduce  the  GTD  theorem. 
Theorem  5.2.3  (GTD  theorem)  Let  H  G  C"^"  have  rank  K  with  singular  values 
A  €  R^.   There  exists  an  upper  triangular  matrix  R  €  C^**^  and  matrices  Q  and  P 
with  orthonormal  columns  such  that  H  =  QRP*  if  and  only  if  the  diagonal  elements  of 
R  satisfy  |r|  -<x  A. 

Proof:  We  relegate  the  proof  to  Chapter  6.  ■ 
There  is  a  computationally  efficient  and  numerically  stable  algorithm  to  achieve  the 

GTD  predicted  by  Theorem  5.2.3,  which  is  presented  in  Chapter  6. 

5.3    Tunable  Channel  Decomposition 

5.3.1  TCD-VBLAST 

We  see  from  (5.2)  that  F  can  always  be  scaled  such  that  a  =  1.  Hence  without  loss 

of  generality,  we  let  a  =  1  in  the  sequel  to  simplify  the  notation. 


(5.11) 


i=l  i=l 
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Denote  the  SVD  of  a  rank  K  channel  H  as  H  =  UAV*,  where  A  is  a  K  x  K 
diagonal  matrix  whose  diagonal  elements  {\hm]Li  are  the  nonzero  singular  values  of 
H.  The  conventional  SVD  based  linear  transceiver  designs  have  precoder  F  =  V*^/^ 
where  $  is  a  diagonal  matrix  whose  diagonal  elements  stand  for  the  power  allocation. 
The  precoder  F  transforms  the  MIMO  channel  into  K  orthogonal  subchannels  with 
capacities 

Ck  =  \og^{l  +  \l<t>k)    bps/Hz,    k=l,2,...,K.  (5.12) 

For  this  kind  of  precoder  design,  the  only  way  of  controlling  the  capacity  of  the  sub- 
channels is  to  change  the  power  allocation  *. 
If  we  modify  the  precoder  F  to  be  ^ 


F  =  V$i/2j^^ 


(5.13) 


where  n  e  R^'^^  with  L  >  and  n^n  =  I,  then  it  can  been  readily  seen  that 
introducing  n  does  not  change  the  overall  channel  capacity.  However,  it  brings  much 
greater  flexibility  as  demonstrated  in  the  following  theorem. 

Theorem  5.3.1  (TCD  Theorem)  Consider  a  MIMO  channel  of  (4.I)  with  F  given 
in  (5.13).  For  any  L  >  K,  let  c  e  be  a  zero  vector  with  its  first  K  elements 
replaced  with  {Ck}t„  where  =  log  (l  +  A?,,*^^).  Given  any  rates  {Rk]L„  we  can 
find  an  orthonormal  matrix  Q  €  R^""^  such  that  the  combination  of  the  linear  precoder 
F  =  V$i/2nT  f,^^  MMSE-VBLAST  detector  yields  L  subchannels  with  capacities 
{Rk}k=\  if  and  only  if  [Rk]k=i  c. 

Proof:   Given  the  precoder  of  (5.13),  the  virtual  channel  is 

G  ^  HF  =  UA$V2j^T  A  uAcfiT  (5.14) 

where  Ac  =  A^'/^  is  a  diagonal  matrix  with  diagonal  elements 

^G,i  =  XH,i'f>\''^,    i  =  l,...,K.  (5.15) 


3  Letting  n  to  be  complex-valued  does  not  introduce  additional  flexibility  as  is  clear 
accordmg  to  the  GTD  algorithm. 
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Let  the  augmented  matrix  Ga  be  defined  as 


h 


(5.16) 


{Mr+L)xL 

After  some  straightforward  calculations,  we  can  obtain  the  SVD  of  Go  as  the  following: 


(5.17) 


where  f2o  E  R^^^  is  orthogonal  with  its  first  K  columns  forming  CI  and  the  diagonal 
matrix  Ag„  contains  the  singular  values  of  GqI 


1,  i>  K. 


(5.18) 


According  to  Theorem  5.2.3,  we  can  apply  GTD  to  obtain 

Ag„  =  QRg.P^ 


(5.19) 


if  and  only  if  the  diagonal  elements  of  Kg,  G  M^""^,  which  we  denote  as  {rG„,ii}f=i, 
satisfy 


(5.20) 


Note  that  both  Q  and  P  in  (5.19)  are  real- valued  matrices  because  Ag„  is  a  real- valued 
diagonal  matrix.  Inserting  (5.19)  into  (5.17)  yields 


G„  = 


U[Ag;0;,x(L-A-)]A5J 


(5.21) 


Choose  fio  =      and  define 


Q 


(5.22) 


Then  (5.21)  can  be  rewritten  as  Ga  =  QcaRca,  which  is  the  QR  decomposition  of  Ga. 
By  Lemma  4.1.2,  it  follows  that  for  a  =  1,  (5.20)  is  equivalent  to 


(5.23) 
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where  Pi,  I  <  i  <  L,  denotes  the  output  SINR  of  the  ith  subchannel,  and  A^.i  is  given 
in  (5.18).  If 

{Ri}l,  =  {log(l  +  p,)}l,  ^+  {logXljt,  =  c,  (5.24) 

then  (5.20)  and  (5.23)  hold,  which  implies  the  existence  of  ft  (the  first  K  columns  of 
PT). 

Conversely,  suppose  that  there  exists  a  semi-unitary  matrix  f2  such  that  the  Hnear 
precoder  F  =  V^^/^fiT  and  the  MMSE-VBLAST  detector  yields  L  subchannels  with 
capacities  {RkjLv   Let  Qg<.Rg„  be  the  QR  decomposition.   It  follows  from 

Theorem  5.2.3  that  (5.20)  holds.  Hence,  by  (5.23),  we  conclude  that  (5.24)  holds.  ■ 

The  proof  of  Theorem  5.3.1  is  constructive.  Indeed,  given  the  SVD  of  H  and  the 
power  loading  level  we  only  need  to  calculate  Ac,  Ag„,  and  the  GTD  of  A^  given 
in  (5.19).  Then  we  immediately  obtain  the  linear  precoder 


P.  (5.25) 


F  =  V*i/2f^T  =  V 
Let  Q^^  denote  the  first  Mr  rows  of  Qg„.  Then  it  follows  fi-om  (5.22)  that 


=  U 


r;o 


'Kx{L-K) 


Q,  (5.26) 


where  T  G  M'^^'^  is  diagonal  with  its  ith  diagonal  element  being  7^  =  —^Sd=.  According 
to  Lemma  4.1.1,  the  nulling  vectors  are  calculated  as 

"^i  =  ^gIu^g^.u    l<i<L,  (5.27) 

where  r^^u  is  the  ith.  diagonal  element  of  Rg„  and  qG„,i  is  the  zth  column  of  Q^^. 

In  the  GTD  algorithm,  P  and  Q  are  obtained  via  multiplication  of  L  —  1  Givens  ro- 
tation matrices.  Hence  calculating  (5.25)  and  (5.26)  needs  0{Mt{L  +  K))  and  0{Mr{L+ 
K))  flops,  respectively.  We  note  that  the  decoding  starts  with  the  Lth  layer,  then  the 
L  —  1th,  and  so  on. 

Given  the  SVD  of  H  and  the  power  allocation  level  *,  the  TCD-VBLAST  scheme 
needs  to  run  the  procedures  summarized  in  Table  5-1.  If  Mt  =  Mr  =  K,  then  the  TCD- 
VBLAST  scheme  requires  only  (9(12  +  +  kL)  flops,  given  the  SVD  of  the  channel 
matrix. 
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Table  5-1:  The  TCD-VBLAST  Scheme 


step 

operation 

flops 

1 

Calculate  Ac  =  A*^/^ 

OiK) 

2 

Obtain  Aq^  using  (5.18) 

0{K) 

3 

Apply  GTD  to  Ag„  to  obtain  (5.19) 

0{L^) 

4 

Generate  F  using  (5.25) 

0{Mt{L  +  K)) 

5 

Compute  Qq^  using  (5.26) 

0{Mr{L  +  K)) 

6 

Calculate  {wj^j  using  (5.27) 

0{MrL) 

5.3.2  TCD-DP 

Similar  to  UCD,  the  TCD  also  have  two  implementation  forms,  which  are  dual  to 
each  other.  As  a  dual  form  of  TCD-VBLAST,  the  TCD  scheme  can  be  implemented  by 
using  a  DP  precoder,  which  we  refer  to  as  TCD-DP.  For  TCD-DP,  a  direct  construction 
of  the  linear  precoder  F  as  done  in  Section  5.3.1  is  not  obvious.  Instead,  we  exploit  the 
uplink-downlink  duality  revealed  in  [49]  to  obtain  TCD-DP.  This  technique  is  also  used 
in  [59]. 

We  first  apply  the  TCD-VBLAST  scheme  to  the  reverse  channel 


y  =  H*Fx  +  z. 


(5.28) 


where  the  roles  of  the  transmitter  and  receiver  are  exchanged  and  the  H  in  (5.1)  is 
replaced  by  H*.  Then  we  obtain  the  precoder  F  and  the  equalizer  W  =  [wi, . . . ,  w^,] 
from  H*  according  to  (5.25)  and  (5.27),  respectively.  Applying  F  and  the  VBLAST 
detector  with  nulling  vectors  {wi}^i,  we  obtain  L  subchannels 

w*y  =  w*H*f,x.  +      w;H*fjX,-  +  w*z,    z  =  1, . . . ,  L,  (5.29) 
i=i 

where  the  ith.  subchannel  (5.29)  is  free  of  interference  from  the  jth  (j  >  i)  subchannels 
which  are  detected  and  cancelled  out  in  advance.  The  SINR  of  the  subchannel  (5.29)  is 

_  |w*H*f,|V^ 

Note  that  replacing  by  w^,  which  is  obtained  by  scaling  -Wi  such  that  ||Wi||  =  1,  does 
not  change  p,  since  the  output  SINR  is  invariant  to  the  length  of  w,.  Also  note  that 
a  =  1,  i.e.,  al  =  aj.  Hence  (5.29)  can  be  simplified  to  be 

|w*H*fi|2 


Pi  = 


i+E:=iiw»*H*f,|2- 


(5.31) 
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Let  fi,  i  =  1, . . . ,  L,  be  the  scaled  version  of  fj  and  has  unit  length.  Denote  pi  =  ||fi|p. 
Then 

|w*H*fi|2pi 


z  =  1 —  ,  L. 


Let      =  |f*Hwj|2.  Then  (5.32)  can  be  represented  in  the  matrix  form 


(5.32) 


an 

0 

■  0 

Pi 

Pi 

-P2ai2 

022 

.  0 

P2 

P2 

-PLa2L  ■ 

■  aiL 

PL 

PL 

(5.33) 


According  to  the  uplink-downlink  duality,  in  the  original  channel,  the  precoder  of  TCD- 
DP  should  be  F  =  [y^Wi, . . . ,  y^Wi],  where  {qi}fLi  will  be  determined  later  in  (5.37), 
and  the  receiving  vectors  are  fi,  i  =  1, . . .  ,L.  Then  we  get  L  subchannels  whose  ith 
scalar  subchannel  of  the  MIMO  channel  is 

L  i-l 

y,  =  CHwiV^x,  +  J2  f;Hwj  v%x,  +      f'Hw.V^Ja;,-  +  f/z.  (5.34) 

j=i+l  J=l 

Applying  the  dirty  paper  precoder  to  Xi  and  treating  ^i^'^jy/Oj^j  ^  the  interfer- 
ence known  at  the  transmitter  (note  that  here  we  precode  the  first  layer  first  while  for 
TCD-VBLAST,  we  detect  the  Lth  layer  first),  we  obtain  an  equivalent  subchannel 

L 


j=i+l 


(5.35) 


with  SINR  (again,  recall  that  q  =  1  and  crl  =  crl) 
Similar  to  (5.32),  (5.36)  can  also  be  represented  as 


for  i  =  1, 2, . . . ,  L. 


(5.36) 


an    -piau    ■  ■  ■ 

-PiaiL 

<7i 

Pi 

0  022 

-P2a2L 

92 

P2 

0       ■••  0 

Ql 

PL 

(5.37) 


It  is  easy  to  see  that  qi  >  0,  0  <  i  <  L.  It  is  proven  in  [49]  that  Xli"=i  9i  =  tr(FF*)  = 
tr(FF*)  =  ^i^iPi.  That  is,  to  obtain  L  subchannels  with  SINRs  {^Ifli,  the  TCD-DP 
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needs  exactly  the  same  power  as  the  TCD-VBLAST.  To  make  this  chapter  self-contained, 
we  give  below  an  alternative  proof  to  this  interesting  and  useful  fact. 

Let  denote  a  strictly  upper  triangular  matrix  whose  {i,j)th  entry  is  aj_,  for 
^  <i  <j  <  L  and  zero  otherwise.  Let  Va  and  Vp  be  two  L  x  L  diagonal  matrices  with 
their  ith  element  equal  to  an  and  pi,  respectively.  Then  (5.32)  can  be  rewritten  as 

{Va  -  VpUj)  p  =  p  (5.38) 

or  equivalently 

{V;'Va-UJ)p  =  1  (5.39) 
where  p  =  \pi,  ■  ■  ■  ,PlV ,  p  =  [pu  . . . , pi]^  and  1  is  a  vector  with  unit  elements.  Hence 

P^{V;'Va-KI)-'i  (5.40) 
Similarly,  (5.37)  can  be  rewritten  as 

{VA-VpUA)q  =  p  (5.41) 


or 


Hence 


{V;'VA-UA)q=l.  (5.42) 
q={V;'VA-UAy'l.  (5.43) 


Prom  (5.40)  and  (5.43), 

^  L 
J2p>  =  1^  {V;'VA-Ujy'  1  =  1^  {V;'VA-UAy'  l  =  Tq,.  (5.44) 

'=1 

We  can  use  the  Tomlinson-Harashima  precoder  [42]  [43]  or  the  trellis  precoder  [44] 
to  achieve  known  interference  cancellation  at  the  transmitter.  For  a  system  with  high 
dimensionality,  TCD-DP  is  a  better  choice  than  TCD-VBLAST  since  it  is  free  of  prop- 
agation errors. 

5.4    MIMO  Communications  with  QoS  Constraints 

In  this  section,  we  apply  the  TCD  scheme  to  MIMO  communications  with  QoS 
constraints.  Suppose  we  want  to  transmit  L>K  independent  data  streams  through  a 
MIMO  channel.  Instead  of  multiplexing  all  the  substreams  in  the  time  division  manner 
to  share  the  entire  MIMO  channel,  we  apply  TCD  to  decompose  the  MIMO  channel 
into  multiple  subchannels  whose  capacities/SINRs  meet  the  QoS  requirements  of  the 
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substreams,  and  dedicate  one  subchannel  to  each  substream.  In  [28],  the  authors  studied 
the  same  problem.  They  proposed  a  linear  transceiver  design  which,  similar  to  TCD,  can 
also  control  the  SINR  of  each  subchannel  via  designing  the  precoder.  However,  the  linear 
transceiver  is  capacity  lossy  and  can  suffer  from  considerable  performance  degradation 
compared  with  our  TCD  scheme  as  we  will  show  at  the  end  of  this  section.  Given  that 
all  the  subchannels  meet  the  QoS  constraints,  we  want  to  minimize  the  overall  input 
power.  We  need  to  solve  the  following  optimization  problem: 

minp 

subject  to    (  I  =  QR  (^•'^^) 


diag(R)  =  {./TT^i}tv 

Here  QR  denotes  the  QR  decomposition  and  diag(R)  denotes  the  vector  formed  by  the 
diagonal  of  R.  According  to  Lemma  4.1.2,  the  diagonal  of  R  determines  the  SINRs  of 
the  subchannels.  Without  loss  of  generality,  we  assume  that  P\  >  P2  >  ■  ■  ■  >  Pl-  We 
now  consider  a  problem  whose  constraints  are  more  relaxed  than  those  of  (5.45): 

minp      tr  (FF*) 

(5.46) 

subject  to   Ag„  +  /3t},=i,         =  ' 

where  Ag„  stands  for  the  singular  values  of  the  augmented  matrix  Gq.  In  general,  for 
any  matrix  A,  we  let  denote  the  singular  values  of  A.  By  Theorem  5.2.3,  if  F  is 
feasible  in  (5.45),  then  F  is  feasible  in  (5.46).  We  now  further  simplify  (5.46)  and  show 
that  its  solution  provides  a  solution  of  (5.45). 

Theorem  5.4.1  // H  =  UAV*  is  the  singular  value  decomposition  of  H,  then  (5.46) 
has  a  solution  of  the  form  F  =  W^^l^  where  $  G  W"^^  is  a  diagonal  matrix  with 
diagonal  elements  (pi,  1  <  i  <  K ,  chosen  to  solve  the  problem 

subject  to  nf=i(l  +  ^//,.</'0  >  nti(l  +  /3.)>    '^fc  > '/'fc+i  >  0,  l<k<K-l, 

nf=,{i+Ai,>)  =  nti(i+/^o. 

(5'47) 
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Moreover,  j/QRg^P"^  is  the  GTD  of  Ac,  in  (5.19),  then  (5.45)  has  the  solution  F  = 
V$V2j7T  yjhere  ^  is  a  solution  of  (5.47)  and  ft  is  the  matrix  formed  by  the  first  K 
columns  ofP^. 

Proof:   See  Appendix  A.  I 
We  now  develop  an  efficient  algorithm  for  solving  (5.47).   We  will  see  that  the 
constraint  (pk  >  (t)k+\  can  be  omitted  since  it  is  automatically  satisfied  at  a  minimizer  of 
(5.47).  To  begin,  we  make  a  change  of  variables  to  further  simphfy  the  formulation  of 
(5.47).  We  define 

=   (t>i  +  l/\l^i,  l<i<K, 
A    =   1^,  l<t<K,  (5.48) 

With  these  definitions,  (5.47)  reduces  to 

}  (5.49) 

subject  to  nti     >  nti  A,    A  >  l<k<K.  j 

Both  the  equality  constraint  and  the  inequalities  (f)k  >  (f>k+i  in  (5.47)  have  been  dropped 
since  these  constraints  are  automatically  satisfied  at  an  optimum.  The  fact  that  <pk  > 
(pk+i  is  established  after  Lemma  5.4.2.  With  regard  to  the  equality  constraint,  if  ip  is 
feasible  in  (5.49)  and  the  inequality  corresponding  to  k  =  K  is  strictly  positive,  then  the 
cost  is  reduced  when  the  trailing  components  of  ((>  are  lowered.  That  is,  if  ijj  is  feasible 
in  (5.49)  and  the  inequality  corresponding  to  k  =  K  is  strictly  positive,  then  the  cost  is 
reduced  when  the  trailing  components  of  t/>  are  lowered. 

Clearly,  the  feasible  set  for  (5.49)  is  nonempty  and  the  cost  function  tends  to  infinity 
as  any  of  the  components  of  tends  to  infinity.  By  continuity  of  the  cost  function  and 
the  constraints,  a  minimizer  must  exist.  We  now  analyze  the  structure  of  the  minimizer. 
By  exploiting  the  structure,  we  obtain  a  fast  algorithm  for  solving  (5.49). 

We  first  study  a  similar  optimization  problem  with  relaxed  constraints. 
Lemma  5.4.1  Any  solution  tp  of  the  problem 

K  k  k 

mm  ^0,  subject  to  J]^' ^  11^"   ^<^<^^  (5.50) 

»=1  i=l  i=l 

has  the  property  that  ?/;;_,.  i  <     for  each  i. 
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Proof:  We  replace  the  inequalities  in  (5.50)  by  the  equivalent  constraints  obtained 
by  taking  log's: 

k  k 

Y^\og{tPi)>^\ogm,  l<k<K, 

i=l  i=l 

The  Lagrangian  C  associated  with  (5.50),  after  this  modification  of  the  constraints,  is 


K    /  k 


£(V,  /i)  =       Ufc  -  /ifc      (log(Vi)  -  log(A)) 


k=l    \  1=1 


By  the  first-order  optimality  conditions  associated  with  ip,  there  exists  ^  >  0  with  the 
property  that  the  gradient  of  the  Lagrangian  with  respect  to  V  vanishes.  Equating  to 
zero  the  partial  derivative  of  the  Lagrangian  with  respect  to  tpj,  we  obtain  the  relations 


K 


i=j 


Hence,  ipj  -  ■0^+1  =  Mj  >  0.  g 
Using  Lemma  5.4.1,  we  can  gain  insights  into  the  structure  of  a  solution  to  (5.49). 
Lemma  5.4.2  There  exists  a  solution     to  (5.49)  with  the  property  that  for  some  integer 


V'i+i  <  0,  for  all  i  <  j,    ^i+i  >     for  all  i>j,    ipi  =  ^  for  all  i  >  j.  (5.51) 

In  particular,  ipj  <  rpi  for  all  i. 

Proof:  If  is  a  solution  of  (5.49)  with  the  property  that  Tpi  >  for  all  1  <  ?  < 
K,  then  by  the  convexity  of  the  constraints,  it  follows  that  t/j  is  a  solution  of  (5.50).  By 
Lemma  5.4.1,  we  conclude  that  Lemma  5.4.2  holds  with  j  =  K.  Now,  suppose  that  V  is 
a  solution  of  (5.49)  with  i>i  =  \/X]^  .  for  some  i.  We  wish  to  show  that  =  l/A^^^  for 
all  k  >  i.  Suppose,  to  the  contrary,  that  there  exists  an  index  k  >  i  with  the  property 
that  V-fc  =  and  rpk+i  >  l/X^k+v  We  show  that  components  A;  and  A;  +  1  of  V' 

can  be  modified  so  as  to  satisfy  the  constraints  and  make  the  cost  strictly  smaller.  In 
particular,  let  ip{e)  be  identical  with  V  except  for  components  k  and  k  +  I: 

Mi)  =  {l  +  e)A-    and    V'fc+i(e)  =  ^.  (5.52) 

1  +  e  ^  ' 

For  e  >  0  small,  ^(e)  satisfies  the  constraints  of  (5.49).  The  change  A(e)  in  the  cost 
function  of  (5.49)  is 
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The  derivative  of  A(e)  evaluated  at  zero  is 

A'(0)  =  ^fc-^fc+i. 

Since  is  an  increasing  function  of  k  and  since  Vfc  =  V-^//,*;'       conclude  that 

ipk+i  >  i^k  and  A'(0)  <  0.  Hence,  for  e  >  0  near  zero,  il){e)  has  a  smaller  cost  than 
t/>(0),  which  yields  a  contradiction.  Hence,  there  exists  an  index  j  with  the  property 
that  ipi  —  for  all  i  >  j  and  il^i  >  1/A|^;  for  all  i  <  j. 

According  to  Lemma  5.4.1,  ipi  >  ipi+i  for  any  i  <  j.  To  complete  the  proof,  we 
need  to  show  that  ipj  <  ipj+i-  As  noted  previously,  any  solution  of  (5.49)  satisfies 

K  K 

i=l  t=l 

which  implies  (cf.  (5.48)) 

i=l  i=l  \t=j+l 

That  is,  the  constraint  flLi '^i  —  OLiA  (5.49)  is  inactive.  If  ijjj  >  t/'j+i,  we  will 
decrease  the  j'-th  component  and  increase  the  j  +  1  component,  while  leaving  the  other 
components  unchanged.  Letting  il){8)  be  the  modified  vector,  we  set 

V',+i(<5)  =  (l  +  5)V,+i    and    ^^.(5)  =  ^. 

Since  the  j-th  constraint  in  (5.49)  is  inactive,  'tj}{5)  is  feasible  for  5  near  zero.  And  if 
xl)j  >  then  the  cost  decreases  as  6  increases.  It  follows  that  ipj  <  ipj+i.  ■ 
By  Lemma  5.4.2,  is  a  decreasing  function  of  i  for  2  €  [1,  j]  while  (pi  =  l/Xfn 
for  i  >  j.  Since  A//_,  is  a  decreasing  function  of  i,  it  follows  that  (f>i  =  ipi  —  l/A'^^  is  a 
decreasing  function  of  z  for  i  G  [l,j]  with  (pi  >  0,  while  (p^  =  0  for  i  >  j.  Hence,  (pj  is 
a  decreasing  function  of  f  6  [1,  K].  In  particular,  the  constraint  (pk  >  (pk+i  in  (5.47)  is 
automatically  satisfied  by  the  associated  solution  characterized  in  Lemma  5.4.2. 

We  refer  to  the  index  j  in  Lemma  5.4.2  as  the  "break  point."  At  the  break  point, 
the  lower  bound  constraint  tpi  >  1/A^  ;  changes  from  inactive  to  active.  We  now  use 
Lemma  5.4.2  to  obtain  an  algorithm  for  (5.49). 
Lemma  5.4.3  Let  7*.  denote  the  k-th  geometric  mean  of  the  ^i'. 

i/k 
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function  t/>  =  TCDPow  (/3,A) 

L  =  1  ;  R  =  length  ifi)  ;  jp  =  zeros  (1 ,  R)  ; 

C  =  cumsum  (log  (/3))  ; 

while  R  >  L 

[t.  1]  =  max  (Ca:R)./[l:R-L+l])  ; 
7(  =  exp  (t)  ;  LI  =  L  +  1  -  1  ; 
if  7,  >  1/A(L1)"2 
V'a:Ll)  =  7;  ; 
L  =  L  +  1  ; 

C(L:R)  =  Ca:R)  -  CCL-D  ; 

else 

ip(Ll:K)  =  l./(Aai:R).'2)  ; 

Cai-1)  =  C(R)  -  sum  (log  (V'(L1:R)))  ; 

R  =  LI  -  1  ; 

end 

end 

Figure  5-2:  A  Matlab  function  to  solve  (5.49). 

and  let  I  denote  an  index  for  which  7fc  is  the  largest: 

I  =  arg  max{7fc  :  1  <  /c  <  Z^}.  (5.53) 

Ifli  >  ,  then  putting  xpi  =  7,  for  all  i  <  I  is  optimal  in  (5.49).  7/7,  <  1/Af,  then 
■0.  =  for  alii  >  I  at  an  optimal  solution  of  (5.49). 

Proof:  See  Appendix  B.  g 
Based  on  Lemma  5.4.3,  we  can  use  the  following  strategy  to  solve  (5.49).  We  form 
the  geometric  mean  described  in  Lemma  5.4.3  and  we  evaluate  /.  If  7,  >  then 
we  set  4>i  =  7,  for  i  <  I,  and  we  simplify  (5.49)  by  removing  ipi,  I  <  i  <  I,  from  the 
problem.  If  7,  <  1/A^  „  then  we  set  =  l/X^  ^  for  i  >  I,  and  we  simplify  (5.49)  by 
removing  xpi,  l<i<K,  from  the  problem.  The  Matlab  code  TCDPow  implementing  this 
algorithm  appears  in  Figure  5-2. 

After  obtaining  the  power  loading  level  0i  =  -  1/A^ l<i<K,we  calculate  the 
precoder  F  and  the  nulling  vectors  {wj^ij  according  to  Table  5-1  in  Section  5.3.  Note 
that  one  of  the  possible  paths  through  the  TCDPow  routine  makes  the  leading  elements  of 
all  equal  while  setting  the  trailing  elements  of  V't  =  1/A^,.  This  path  coincides  with 
the  standard  water  filling  algorithm.  In  this  case,  the  TCD  scheme  is  optimal  in  terms 
of  maximizing  the  overall  throughput  given  the  input  power.  On  the  other  hand,  if  some 
substream  has  a  very  high  prescribed  SINR  such  that  the  /  given  in  (5.53)  is  less  than  the 
"break  point"  j,  then  ^  leads  to  be  a  multi-level  water  filling  power  allocation,  which 


69 

suffers  from  overall  capacity  loss.  This  happens  when  the  target  rate  vector  . . .  Rl] 
falls  out  of  the  convex  hull  spanned  by  the  L\  permutations  of  [Ci, . . .  ,Ck,0,  . . .  ,0] 
(cf.  Figure  5-1),  where  Ck,k  =  1,. . .  ,K,  are  the  capacities  of  the  eigen  subchannels 
with  water  filling  power  allocation.  As  a  remedy  to  this  issue,  one  can  "break"  (if  it 
is  practically  allowable)  the  oversized  substream  into  more  than  one  substreams  with 
smaller  rates,  or  equivalently,  lower  SINR  requirements.  Note  that  TCD  can  decompose 
a  MIMO  channel  into  an  arbitrarily  large  number  of  subchannels. 

An  interesting  special  case  is  that  pi  =  p2  =  ■  ■  ■  =  pi,  i  e.,  the  substream  shares 
the  same  SINR  requirements.  In  this  case,  Pi  <  (32  <  ■  ■  ■  <  0k  since  the  singular  values 
{^H,i}^=i  are  in  nonincreasing  order,  and  TCDPow  yields  a  standard  water  filling  solution. 
In  this  case,  TCD  becomes  UCD. 

We  present  two  numerical  example  to  conclude  this  section.  In  the  first  example,  we 
assume  Rayleigh  independent  flat  fading  channels  with  Mt  =  5  and  =  6.  We  consider 
equal  QoS  requirements  for  L  =  5  independent  substreams.  Figure  5-3  compares  the 
input  power  needed  by  our  TCD  scheme  and  the  linear  transceiver  scheme  of  [28] .  Our 
scheme  can  save  about  2.5  dB  for  any  prescribed  output  SINR. 


=  6,  M|  =  5  iid  Rayleigh  Flat  Fading 


_5l  :  I  I  I  I 

0  5  10  15  20  25 

Prescribed  Output  SINR  (dB) 

Figure  5-3:  Input  SNR  vs.  Output  SINR.  The  result  is  based  on  the  average  of  500 
Monte  Carlo  trials  of  a  i.i.d.  Rayleigh  flat  fading  channel  with  Mt  =  5  and  Mr  =  6. 

In  the  second  example,  we  consider  a  rank  two  MIMO  channel  with  singular  value 
Ai,A2.   Suppose  we  want  to  decompose  the  MIMO  channel  into  2  subchannels  with 
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capacity  Ci  and  C2  with  Ci  +  C2  =  10  bps/Hz.  We  consider  the  three  scenarios  with 
(Ai  =  2,A2  =  1),  (Ai  =  5,  A2  =  1),  and  (Ai  =  10,  A2  =  1).  For  all  the  three  cases,  there 
is  an  inflection  point  beyond  which  our  TCD  is  the  same  as  the  linear  design  of  [28]. 
That  is  because  when  the  two  subchannels  have  very  disparate  QoS  constraints,  i.e., 
Ci  is  far  larger  than  C2,  the  optimal  strategy  is  to  apply  SVD  to  the  channel  matrix 
and  transmit  data  through  the  orthogonal  eign-subchannels.  (In  this  case,  =  I.  (cf. 
(5.13)).)  If  the  subchannels  QoS  constraints  are  not  too  disparate,  which  corresponds 
to  the  region  to  the  left  of  the  inflection  point,  the  required  input  power  of  our  TCD 
scheme  is  invariant  with  respect  to  Ci,C2  and  is  strictly  less  than  that  needed  by  the 
linear  design.  This  region  corresponds  to  the  capacity  lossless  region  (cf.  Figure  5-1). 
Another  interesting  point  is  that  the  relative  advantage  of  TCD  is  more  prominent  if 
the  singular  values  Ai,A2  become  more  disparate. 


5         5.5         6         6.5         7         7.5         8  8.5 


C,=10-C^ 

Figure  5-4:  Input  SNR  vs.  Ci.  A  rank  2  channel  is  decomposed  into  two  subchannels 
with  capacities  Cj  and  C2  =  10  —  Ci. 

5.5    CDMA  Sequence  Design 

As  we  have  shown  in  Section  2.1.1,  the  CDMA  sequence  design  problem  can  be 
viewed  as  a  special  case  of  the  MIMO  transceiver  design.  In  an  idealized  S-CDMA 
system  where  the  channel  does  not  experience  any  fading  or  near-far  effect,  L  mobile 
users  modulate  their  information  symbols  via  spreading  sequences  {sjf^j,  each  of  which 
has  the  processing  gain  A^.  The  discrete-time  baseband  S-CDMA  signal  received  at  the 
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(single-antenna)  base-station  can  be  represented  as  [31] 


y  =  Sx  +  z 


(5.54) 


where  S  =  [si,...,sl]  G  K^''^  and  the  Ith  {1  <  I  <  L)  entry  of  x,  x;,  stands  for 
the  information  symbol  from  the  Ith  user.  In  the  downlink  channel,  the  base  station 
multiplexes  the  information  dedicated  to  the  L  mobile  users  through  the  spreading 
sequences,  which  are  the  columns  of  S.  Then,  all  the  mobiles  receive  the  same  signal 
given  in  (5.54).  We  remark  that  (5.54)  can  also  be  written  as  (4.1)  with  H  =  I^r  and 
F  =  S.  Here  Mr  =  Mt  =  N  is  the  processing  gain.  Hence,  optimizing  the  spreading 
sequences  amounts  to  optimizing  the  precoder  F  for  a  MIMO  system.  Indeed,  due 
to  the  simple  channel  matrix  (H  =  I),  some  procedures  of  the  TCD  scheme  can  be 
simplified.  We  shall  show  that  the  TCD  scheme  turns  out  to  be  an  improved  solution 
to  the  sequence  design  proposed  in  [56].  At  the  end  of  this  section,  we  will  compare  our 
TCD  scheme  and  the  scheme  proposed  in  [56]. 

5.5.1  CDMA  Sequences  Maximizing  Sum  Capacity 

Recall  that  the  precoder  maximizing  the  overall  MIMO  channel  capacity  is  F  = 
V$V2f2T  ^j^gi.g  $  jg  obtained  by  water  filling  algorithm.  For  an  S-CDMA  channel, 
H  =  I,  then  V  =  I  and  the  optimal  power  loading  level  is  the  uniform  power  allocation. 
Hence  the  CDMA  sequence  maximizing  the  sum  capacity  is  S  =  y^f2^.  Since  has 
orthonormal  columns,  we  obtain  SS^  =  pi.  This  observation  coincides  with  the  findings 
in  [31],  in  which  the  authors  show  that  the  CDMA  sequences  maximizing  the  sum 
capacity  are  the  Welch-Bound-Equality  sequences. 

5.5.2  Uplink  Case 

For  the  uphnk  scenario,  i.e.,  the  mobiles  to  base  station  case,  the  base  station  cal- 
culates the  optimal  CDMA  sequences  for  each  mobile  user  and  the  associated  successive 
nulling  vectors  needed  by  itself.  Then  the  base  station  informs  the  mobile  users  their 
designated  CDMA  sequences. 

First,  we  need  to  calculate  the  power  loading  levels  $  G  R^'''^  such  that  the 
following  GTD  matrix  decomposition  is  possible: 


where  the  diagonal  elements  of  R,  r„  ,  f  =  1,  2, . . , ,  L,  satisfy  the  QoS  constraints.  Note 
that  the  singular  values  of  $a  form  a  sequence  whose  first  N  elements  are  v^TT^,  i  = 


(5.55) 
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1,2, .  ..,N,  followed  by  L  -  N  ones.  From  Theorem  5.2.3,  (5.55)  exists  if  and  only  if 

{{l  +  4>i}LA,-..A)y.  {l  +  PiKti-  (5.56) 
Similar  to  (5.47),  we  need  to  solve  the  problem 

subject  to   ({1  +  1, . . . ,  1)       {1  +  PiU=v  (5-57) 

<j)i  >  0,  V^ 

Similar  to  (5.49),  (5.57)  can  be  further  simplified  using  the  variables 

L 

ipi  =  <f)i  +  l,    (}i  =  1  +  Pi  for  i  <  N,    and    fSN  ^  Ylil  +  Pi). 

i=N 

The  simplified  problem  is 


(5.58) 


subject  to  nti     >  nti  A.    V'fc  >  1,  1  <    <  ^■ 

The  algorithm  TCDPow  simplifies  immensely  when  we  apply  it  to  (5.58).  Since  A  >  1  = 
for  all  i,  the  constraints  ipi  >  I  are  inactive.  Since  jSi  <  for  all  i  <  N,  the 
geometric  means  satisfy  7,  <  7i_i  for  all  i  <  A^.  Hence,  in  Lemma  5.4.3,  the  value  of  / 
is  either  1  or  A^.  If  /  =  1,  then  we  set  tl'i  —  A  we  remove  rpi  from  the  problem.  If 
I  —  N,  then  tpi  =  for  all  i.  It  follows  that  there  exists  an  index  j  with  the  property 
that 

rpi  =  Pi  for  all  i  <  j    and    rpi  =  i  H  A  for  ^'1  ^  >  j- 

\i=j+l  J 

This  observation  coincides  with  the  solution  obtained  in  [56]. 

Let  ^  denote  an  L  x  L  identity  matrix  with  its  first  A''  diagonal  elements  replaced 
by  V'i,  1  <  ?  <  A'^-  According  to  the  TCD  scheme  presented  in  Section  5.3.1,  we  then 
apply  the  GTD  algorithm  to  ^'/^  to  obtain 

=  Q*R*P^;.  (5.59) 

According  to  (5.25), 


(5.60) 


S  =  F 
Let 

[vi,...,v^]  =  [*^/';0;vx(L-yv)]«'"'/'Q*.  (5.61) 
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By  (5.26)  and  (5.27),  the  nulling  vectors  used  at  the  base  station  are 

vfi  =  r^%Vi,  i  =  l,2,...,L,  (5.62) 

where  r*,ii  is  the  ith  diagonal  element  of  R$.  In  summary,  the  base  station  needs  to 
run  the  following  three  steps: 

1.  Solve  the  optimization  problem  (5.58). 

2.  Apply  the  GTD  algorithm  to  ^^/^  in  (5.59). 

3.  Obtain  the  spreading  sequences  for  all  mobile  users,  [si,...,sl]  =  S,  and  the 
nulling  vectors  {wi},ii  (cf-  (5.60)  and  (5.62))  for  the  base  station. 

5.5.3  Downlink  Case 

In  the  downlink  case,  the  mobiles  cannot  cooperate  with  each  other  for  decision 
feed-back.  Hence  the  VBLAST  detection  is  impractical  at  receivers.  However,  we  can 
apply  TCD-DP  as  introduced  in  Section  5.3.2  to  cancel  out  known  interferences  at  the 
transmitter,  i.e.,  the  base  station.  We  can  convert  the  downlink  problem  as  an  uplink 
one  and  exploit  the  downlink-uplink  duality  as  we  have  done  in  Section  5.3.2.  Note  that 
H  =  H*  =  I,  i.e.,  the  downlink  and  uplink  channels  are  the  same!  Consider  the  case 
where  the  uplink  and  downlink  communications  are  symmetric,  i.e.,  for  each  mobile 
user,  the  QoS  of  the  communications  from  the  user  to  the  base  station  and  the  base 
station  to  the  user  are  the  same.  After  obtaining  the  spreading  sequences  [sj, . . . ,  s/,]  for 
the  mobile  users,  and  the  nulling  vectors  [wi, . . . ,  w^,]  used  at  the  base  station  for  the 
uplink  case,  we  immediately  know  that  in  the  downlink  case  the  spreading  sequences 
transmitted  from  the  base  station  are  exactly  [wi, . . . ,  w^]  and  the  nulling  vectors  used 
at  the  mobiles  are  the  spreading  sequences,  [si, . . . ,  s^],  used  in  the  uphnk  case.  The  only 
parameters  we  need  to  calculate  are  gi, . . . ,  g^,  (cf.  (5.37)).  Hence  in  this  symmetric  case, 
the  base  station  only  needs  to  inform  the  mobiles  their  designated  spreading  sequences 
once  in  the  two-way  communications.  Each  mobile  uses  the  same  sequence  for  both  data 
transmission  in  the  uplink  channel  and  interference  nulling  in  the  downlink  channel. 

5.5.4  Numerical  Example 

We  present  one  numerical  example  to  show  how  TOD  can  be  applied  to  CDMA 
sequence  design.  Wc  consider  an  example  where  there  are  L  =  4  mobile  users  and 
the  processing  gain  iV  =  3.  The  prescribed  SINRs  of  the  four  users  are  20,  19, 18,  and 
17  dB,  respectively  For  the  uplink  case,  we  apply  the  TCD- VBLAST  scheme  to  obtain 


74 


the  spreading  sequences  of  the  four  users  as  the  columns  of  the  matrix 


S  = 


V 


10.0000  -12.0745    -6.4974  -3.0926 
0  0  7.4138  -15.5760 

0         8.8312     -13.3801  -6.3686 


/ 


(5.63) 


The  nulling  vectors  used  by  the  base  station  are  the  columns  of  the  matrix 


W 


0.0990  -0.0015   -0.0037  -0.0104 
0  0         0.1157  -0.0522 

0        0.1098     -0.0077  -0.0213 


(5.64) 


We  note  that  for  this  uplink  scenario,  the  base  station  detects  the  fourth  mobile  user, 
which  has  the  spreading  sequence  corresponds  to  the  fourth  column  of  S,  first  and  the 
first  user  last. 

If  the  prescribed  SINRs  of  the  four  users  remain  the  same  in  the  downlink  scenario, 
the  spreading  sequences  used  by  the  base  station  are  the  four  columns  of  the  matrix 


/ 


V 


17.1936   -0.2303   -0.5154  -1.2796 
0  0         16.0012  -6.4449 

0        17.0149    -1.0614  -2.6352 


\ 


/ 


(5.65) 


In  this  case,  the  base  station  applies  the  dirty  paper  precoder  to  the  first  mobile  user 
first  and  the  last  user  last.  Note  that  the  columns  of  F  and  W  in  (5.64)  are  the  same 
up  to  a  scaling  factor.  Moreover,  tr(FF"'')  =  tr(SS''")  =  892.7274.  which  means  that  the 
power  consumed  in  the  base  station  equals  to  the  overall  power  used  by  the  four  mobile 
users.  At  the  mobile  end,  the  users  use  the  nulling  vectors 


S  = 


V 


10.0000   -12.0745    -6.4974  -3.0926 
0  0  7.4138  -15.5760 

0         8.8312     -13.3801  -6.3686 


\ 


(5.66) 


which  are  exactly  the  spreading  sequences  used  in  the  uplink  scenario.  Scaling  the  output 
signals  may  be  necessary  at  the  mobile  ends  for  the  subsequent  dirty  paper- decoder.  But 
the  signal  scaling  does  not  influence  the  output  SINR. 

If  zeros  are  not  allowed  in  the  spreading  sequences,  we  can  left  multiply  S  and  W 
a  3  X  3  orthogonal  matrix  to  eliminate  the  zeros  in  S. 
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5,5.5    Further  Remarks 

The  TCD  scheme,  which  was  originally  motivated  by  MIMO  transceiver  designs, 
turns  out  to  be  similar  to  the  scheme  of  [56]  in  several  aspects.  Both  schemes  are  based 
on  the  nonlinear  decision  feedback  operations.  Hence  both  are  optimal  in  terms  of  max- 
imizing the  channel  throughput  and  minimizing  the  overall  input  power.  Both  the  GTD 
algorithm,  on  which  the  TCD  scheme  is  based,  and  the  construction  of  the  Hermitian 
matrix  with  prescribed  eigenvalues  and  Cholesky  values  as  done  in  [56]  rely  on  the  Weyl- 
Horn  theorem.  However,  our  TCD  scheme  enjoys  several  remarkable  advantages  over 
the  scheme  of  [56].  First,  note  that  if  we  obtain  the  GTD  H  =  QRP*,  where  R  has  the 
prescribed  diagonal  elements,  then  it  follows  immediately  that  A  =  P*H*HP  =  RR*  is 
the  desired  Cholesky  decomposition.  However,  the  information  associated  with  Q  is  lost 
in  the  Cholesky  decomposition.  Hence  the  nulling  vectors  used  at  the  receivers  of  [56] 
cannot  be  calculated  explicitly  as  our  TCD  does  (cf.  (5.27)).  Furthermore,  the  correla- 
tion matrix  A  is  only  an  intermediate  result.  To  get  the  CDMA  sequences,  one  has  to 
decompose  A  =  RR*  explicitly.  The  TCD  scheme,  however,  can  be  used  to  obtain  both 
the  precoder  (CDMA  sequences),  which  are  the  columns  of  P,  and  the  equalizer  from  Q 
simultaneously.  Second,  our  TCD  scheme  is  a  solution  to  the  more  general  MIMO  trans- 
ceiver design  problem.  The  Cholesky  decomposition  algorithm  provided  in  Appendix  C 
of  [56]  is  only  applicable  to  the  scenario  where  the  singular  values  are  only  of  two  values. 
Hence  it  is  not  applicable  to  the  general  design  of  MIMO  transceivers.  The  more  general 
Cholesky  factorization  algorithm  suggested  in  the  proofs  is  computationally  far  less  effi- 
cient. Third,  the  TCD  scheme  has  two  implementation  forms,  i.e.,  TCD-VBLAST  and 
TCD-DP,  which  makes  it  applicable  to  both  the  downlink  and  uphnk  scenarios.  Finally, 
the  TCD  scheme  provides  insights  that  identify  the  CDMA  sequence  design  problem  as 
special  cases  of  the  MIMO  transceiver  design. 

5.6  Conclusions 

Based  on  the  recently  developed  GTD  matrix  decomposition  algorithm,  we  have 
proposed  the  TCD  scheme  utilizing  the  CSIT  and  CSIR.  TCD  can  be  used  to  decompose 
a  MIMO  channel  into  multiple  subchannels  with  prescribed  capacities.  The  TCD  scheme 
has  two  implementation  forms.  One  is  the  combination  of  a  linear  precoder  and  a 
minimum  mean-squared-error  VBLAST  (MMSE-VBLAST)  detector,  which  is  referred 
to  as  TCD-VBLAST,  and  the  other  includes  a  dirty  paper  (DP)  precoder  and  a  linear 
equalizer  followed  by  a  DP  decoder,  which  we  refer  to  as  TCD-DP.  Both  forms  of  TCD 
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are  computationally  very  efficient.  We  have  also  determined  the  subchannel  capacity 
region  such  that  a  capacity  lossless  decomposition  is  possible.  The  applications  of  the 
TCD  scheme  for  MIMO  communications  with  QoS  constraints  have  been  investigated. 
We  have  also  identified  the  problems  of  designing  precoders  for  OFDM  communications 
and  designing  CDMA  sequences  as  special  cases  in  the  unifying  framework  of  MIMO 
transceiver  designs.  In  particular,  we  have  shown  that  the  CDMA  sequence  design 
problem  in  the  uplink  and  downlink  scenarios  can  be  solved  using  TCD-VBLAST  and 
TCD-DP,  respectively. 

Appendix  A 
Proof  of  Theorem  5.4.1 

Observe  that  for  F  —  V^^/^,  we  have 

K 


tr 


{FF*)  =  ^(f>,    and    HF  =  UA*^/^  (5.67) 


Hence,  X'jjpi  =  A^^^c/ij  for  I  <  i  <  K,  and 

^TTa^,  l<i<K, 
1,  i>  K. 


Since  l+Pi  >  1,  the  last  L—K  inequalities  in  the  multiplicative  majorization  condition  in 
(5.46)  are  implied  by  the  single  equality  constraint  in  (5.47).  Hence,  the  problem  (5.46) 
reduces  to  (5.47)  where  F  =  V$^/^,  which  gives  an  upper  bound  for  the  minimum  in 
(5.46). 

Let  F  =  Uf  denote  the  singular  value  de'^omposition  for  any  given  F  € 

C^'^^.  Once  again,  tr  (FF*)  is  given  by  the  sum  in  (5.67).  By  [60,  Theorem  3.3.14], 
the  singular  values  of  the  product  HF  of  two  matrices  are  multiplicatively  majorized  by 
the  product  of  the  singular  values  of  H  and  F: 

k  k 

n  ^hA  >  n  ^HF,r,    i<k<  K.  (5.68) 

1=1  i=l 

Taking  log's,  we  have 

k  k 

Yl  log(^H,i<^.)  >  E  log(^?/F,J .    1  <    <  (5.69) 
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By  [60,  Lemma  3.3.8])  and  (5.69), 

k  k 

J2  /(log(Al,</..))  >  E  /(l«g(^HF..)),    ^<k<K,  (5.70) 

i=l  i=l 

whenever  /  is  a  real-valued,  increasing  convex  function.  The  function  f{t)  =  log(e*  +  1) 
is  convex  since  its  second  derivative  is  positive.  Making  this  choice  for  /  in  (5.70)  and 
exponentiating  both  sides,  we  obtain: 

fe  k 

n(4,i<^'  +  l)>n(^ki  +  l)'    ^<k<K.  (5.71) 

t=l  i=l 

Since  F  is  feasible  in  (5.46), 

fc  fc 


n(^k.  +  l)  >  WP^  +  '^l  l<k<K, 
i=\  t=l 

n(A?,^,+i)  =  fiiPi+i)- 


i=l  i=l 

Combining  this  with  (5.71),  we  get 

fc  fc 


i=l  1=1 
K  L 

n(^H,i'^'+i)  >  n(^.+i).  (5.72) 

2=1  2  =  1 

Since  Xjj  ^(j),  + 1  is  the  square  of  the  i-th  singular  value  of  the  augmented  matrix  Ga  cor- 
responding to  the  choice  V$^/^,  we  conclude  that  F  =  V$^/^  satisfies  all  the  inequaUty 
constraints  in  (5.46).  If  the  inequality  (5.72)  is  strict,  then  (px  should  be  decreased 
in  order  to  satisfy  the  equality  constraint  in  (5.47).  Since  decreasing  (pK  only  lowers 
tr(FF*),  we  deduce  that  the  minimum  in  (5.46)  is  achieved  by  a  matrix  of  the  form 
F  =  V*i/2  If  F  =  V*i/2  is  optimal  in  (5.46),  then  so  is  F  =  V^^^^QJ  whenever 
rt  has  K  orthonormal  columns  (since  the  constraints  are  satisfied  and  the  value  of  the 
cost  does  not  change).  We  now  make  the  choice  for  fi  given  in  Theorem  5.3.1.  That 
is,  if  QRg^P"^  is  the  GTD  of  Ac  in  (5.19)  where  *  is  a  solution  of  (5.47),  then  is 
the  matrix  formed  by  the  first  K  columns  of  P^.  For  this  choice  of  f2,  the  constraints 
of  (5.45)  are  satisfied.  As  noted  earlier,  the  minimum  in  (5.45)  can  be  no  smaller  than 
the  minimum  in  (5.46).  Since  this  choice  for  F  yields  the  same  cost  in  both  (5.45)  and 
(5.46),  we  conclude  that  F  =  V$^/^r2^  is  optimal  in  (5.45). 


78 

Appendix  B 
Proof  of  Lemma  5.4.3 

First  suppose  that  7;  >  l/X}.  By  the  arithmetic/geometric  mean  inequaUty,  the 

problem 

I  I  I 

min  subject  to    JJ?/'i>J|A,      >  0,  (5.73) 

i=l  i=l  1=1 

has  the  solution  V'i  =  7(,  1  <  ?  <  /.   Since  A//i  is  a  decreasing  function  of  i  and 
7/  >  V-^?/.!!      conclude  that     =  7;  satisfies  the  constraints  ipi  >  1/A|/ ■  for  1  <  i  < 
Since  /  attains  the  maximum  in  (5.53), 

k 

for  all  k  <  I.  Hence,  by  taking  V'i  =  7/  for  1  <  i  <  /,  the  first  /  inequalities  in  (5.49)  are 
satisfied,  with  equality  for  =  /,  and  the  first  /  lower  bound  constraints  tpi  >  l/A^^  are 
satisfied. 

Let      denote  any  optimal  solution  of  (5.49).  If 

then  by  the  unique  optimality  of  t/^i  =  7/,  1  <  i  <  in  (5.73),  and  by  the  fact  that  the 
inequality  constraints  in  (5.49)  are  satisfied  for  k  G  [1,Z],  we  conclude  that  ip*  =  7(  for 
all  i  €  [1,  /].  On  the  other  hand,  suppose  that 

Yl'f':>  11(^1  =  y'i  (5.75) 

i=l  i=l 

We  show  below  that  this  leads  to  a  contradiction;  consequently,  (5.74)  holds  and  ip*  =  7; 
for  i  e  [1,1]. 

Define  the  quantity 


By  (5.75)  7«  >  7;.  Again,  by  the  arithmetic/geometric  mean  inequality,  the  solution  of 
the  problem 

min  '^ipi    subject  to    J^V'i>7i,  i^>0,  (5.76) 

i=l  i=l 

is  tpi  =  7,  for  i  e  [1,1].  By  (5.75),  7,  >  7;  and  ip  satisfies  the  inequality  constraints  in 
(5.49)  for  ke[l,l]. 
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Let  M  be  the  first  index  witii  the  property  that 

M  M 


i=l  i=l 

Such  an  index  exists  since  t/>*  is  optimal,  which  impHes  that 

i=l  i=l 

First,  suppose  that  M  <  j,  where  j  is  the  break  point  given  in  Lemma  5.4.2.  By 
complementary  slackness,  =  0  and  ip*  -  V'i'+i  =  Hi  for  I  <  i  <  M.  We  conclude  that 
V'i  =  7»  for  1  <  i  <  M.  By  (5.77)  we  have 

M 
i=l 

It  follows  that 

which  contradicts  the  fact  that  /  achieves  the  maximum  in  (5.53). 

In  the  case  M  >  j,  we  have  ■0i  =  7*  for  1  <  z  <  j.  Again,  this  follows  by 
complementary  slackness.  However,  we  need  to  stop  when  i  =  j  since  the  lower  bound 
constraints  become  active  for  i  >  j.  In  Lemma  5.4.2,  we  show  that  xp*  >  i/'j  =  7*  for 
i  >  j.  Consequently,  we  have 

M  M 
i=l  i=l 


Again,  this  contradicts  the  fact  that  /  achieves  the  maximum  in  (5.53).  This  completes 
the  analysis  of  the  case  where  7/  >  l/Xjfi- 

Now  consider  the  case  7;  <  By  the  definition  of  7;,  we  have 

H>\]\P^]         or    7/''>n^-  (^■'^8) 
\i=i    /  i=i 

If  J  is  the  break  point  described  in  Lemma  5.4.2,  then  tp*  >  xjj*  for  all  i\  it  follows  that 


X{i>':>{^])"-  (5.79) 
1=1 
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Since  the  product  of  the  components  of  ■?/>*  is  equal  to  the  product  of  the  component 
of  /3,  from  (5.78)  and  (5.79)  we  get 

1=1  i=l 

Hence,  ji  >  rp*  >  l/Xff  j  >  l/A^  ,-  for  all  i  <  j.  In  particular,  if  /  <  j,  then  ji  >  , 
or,  /  >  j  when  7(  <  1/Afr,.  As  a  consequence,  xpT  =  vj-. 


CHAPTER  6 
NOVEL  MATRIX  DECOMPOSITIONS 


6.1  Introduction 


Given  a  complex  matrix  H,  we  consider  the  decomposition  H  =  QRP*,  where  R 
is  upper  triangular  and  Q  and  P  have  orthonormal  columns.  Special  instances  of  this 
decomposition  are 
(a)  the  singular  value  decomposition  (SVD)  [61,  62] 


where  E  is  a  diagonal  matrix  containing  the  singular  values  on  the  diagonal, 
(b)  the  Schur  decomposition  [63] 


where  U  is  an  upper  triangular  matrix  with  the  eigenvalues  of  H  on  the  diagonal, 
(c)  the  QR  decomposition  where  P  =  I. 

In  this  chapter,  we  will  introduce  two  novel  matrix  decompositions,  i.e.,  the  geomet- 
ric mean  decomposition  (GMD)  and  the  generalized  triangular  decomposition  (GTD). 
As  we  introduced  before,  the  GMD  scheme  and  the  UCD  scheme  are  based  on  the  GMD 
matrix  decomposition  algorithm,  and  the  TCD  is  based  on  the  GTD  algorithm.  The 
results  of  this  chapter  are  motivated  by  the  applications  of  designing  MIMO  transceiver. 
Interesting,  these  results  turn  out  to  be  also  useful  to  the  numerical  analysis  community. 


In  this  section,  we  present  a  new  unitary  decomposition  which  call  the  geometric 
mean  decomposition  or  GMD.  Given  a  rank  K  matrix  H  G  C"''",  it  is  expressed  in  the 
form  QRP*  where  P  and  Q  have  orthonormal  columns,  and  R  €  M'^'*^  is  a  real  upper 
triangular  matrix  with  diagonal  elements  all  equal  to  the  geometric  mean  of  the  positive 
singular  values; 


H  =  VSW*, 


H  =  QUQ*, 


6.2    Geometric  Mean  Decomposition 


a  = 


l<i<K. 
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Here  the  aj  are  the  singular  values  of  H,  and  a  is  the  geometric  mean  of  the  positive 
singular  values.  Thus  R  is  upper  triangular  and  the  nonzero  diagonal  elements  are  the 
geometric  mean  of  the  positive  singular  values. 

We  were  led  to  this  decomposition  when  trying  to  optimize  the  performance  of 
multiple-input  multiple-output  (MIMO)  systems.  However,  this  decomposition  has 
arisen  recently  in  several  other  applications.  In  [64,  Prob.  26.3]  Higham  proposed 
the  following  problem: 

Develop  an  efficient  algorithm  for  computing  a  unit  upper  triangular  matrix 
with  prescribed  singular  values  ct^,  1  <  i  <  K,  where  the  product  of  the  o-j 
is  1. 

A  solution  to  this  problem  could  be  used  to  construct  test  matrices  with  user  specified 
singular  values. 

The  solution  of  Kosowski  and  Smoktunowicz  [65]  starts  with  the  diagonal  matrix  E, 
with  i-th  diagonal  element  ,  and  applies  a  series  of  2  by  2  orthogonal  transformations 
to  obtain  a  unit  triangular  matrix.  The  complexity  of  their  algorithm  is  0{K'^).  Thus 
the  solution  given  in  [65]  amounts  to  the  statement 

QjSPo  =  R,  (6.1) 

where  R  is  unit  upper  triangular. 

For  general  E,  where  the  product  of  the  ct,  is  not  necessarily  1,  one  can  multiply  E 
by  the  scahng  matrix  apply  (6.1),  then  multiply  by  a  to  obtain  the  GMD  of  E. 

And  for  a  general  matrix  H,  the  singular  value  decomposition  H  =  VEW*  and  (6.1) 
combine  to  give  the  H  =  QRP*  where 

Q  =  VQo    and    P  =  WPq. 

According  to  (3.11),  we  consider  the  problem  of  choosing  Q  and  P  to  maximize  the 
minimum  of  the  ru: 

max  min  (r;,  :  1  <  i  <  A'} 
Q,P  I  II       _     _  J 

subject  to    QRP'  =  H,  Q*Q  =  I,  P*P  =  I,  (6.2) 

=  0  for  i>  j,  Re  K^^'^^, 

where  K  is  the  rank  of  H.  Since  the  GMD  of  H  is  feasible  in  (6.2),  we  conclude  that 
the  GMD  yields  the  optimal  solution  to  (6,2). 
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6.2.1    Generalized  Maximin  Properties 

We  consider  the  following  problem: 


max  min  {uu  ■  I  <  i  <  K} 

subject  to    GUF*  =  H,       =  0  for  i  >     U  e  M^^^, 


(6.3) 


Uii>  0,  I  <i  <  K, 


tr((G*G)-i)<pi,  tr  ((F*F)-i)  <  p2. 


If  pi  =  p2  =  K,  then  any  Q  and  P  feasible  in  (6.2)  are  feasible  in  (6.3).  Hence,  the 
problem  (6.3)  is  less  constrained  than  the  problem  (6.2)  since  the  set  of  feasible  matrices 
has  been  enlarged.  Nonetheless,  we  now  show  that  the  solution  to  this  relaxed  problem 
is  the  same  as  the  solution  of  the  more  constrained  problem  (6.2). 

Theorem  6.2.1  //He  C™**"  has  rank  K,  then  a  solution  of  (6.3)  is  given  by 


where  QRP*  is  the  GMD  o/H. 

Proof:  Let  VEW*  be  the  singular  value  decomposition  of  H,  where  S  €  R^^^ 
contains  the  K  positive  singular  values  of  H  on  the  diagonal.  If  F  and  G  satisfy  the 
constraints  of  (6.3),  then  we  have 


The  column  space  of  GUF*  is  contained  in  the  column  space  of  G.  Since  G  has  K 
columns,  the  dimension  of  the  column  space  is  at  most  K.  Since  GUF*  =  H  has  rank 
K,  the  column  space  of  G  must  coincide  with  the  column  space  of  H,  which  is  equal  to 
the  column  space  of  V.  Hence,  there  exists  a  K  by  K  invertible  matrix  A  such  that 


In  the  same  fashion,  the  column  space  of  F  must  coincide  with  the  column  space 
of  H',  which  is  equal  to  the  column  space  of  W.  And  there  exists  a  K  by  K  invertible 
matrix  B  such  that 


H  =  VEW*  =  GUF*. 


G  =  VA. 


(6.4) 


F  =  WB. 


(6.5) 
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Combining  (6.4)  and  (6.5)  with  the  identity  GUF*  =  H  =  VSW* 


AUB*  =  S. 

It  follows  that 


det  (E*S)  =  det  (BU*A*AUB*)  =  det  (A*A)det  (B*B)  J] 


K 

2 


t=l 


which  gives 

K 

lin  < 

i< 


mrn^  \uu\^^  <  II  \uii\^  =  det  (S*E)det  (A*A)-Met  (B'B)-^  (6.6) 


By  the  constraints  of  (6.3),  we  have 

tr((G*G)-i)   =  tr  ((A*A)-i)  <  pi, 
tr((F*F)-i)   =  tr  ((B*B)-^)  <  p2. 

By  the  geometric  mean  inequality  and  the  fact  that  the  determinant  (trace)  of  a  matrix 
is  the  product  (sum)  of  the  eigenvalues,  a  K  by  K  Hermitian  positive  semidefinite  matrix 
S  satisfies 

det(S,<(^i§))^ 
Using  these  bounds  for  the  determinant  and  the  trace  in  (6.6),  we  have 

mm  K|<^.  (6.7) 

Finally,  it  can  be  verified  that  for  the  choices  of  G,  U,  and  F  given  in  the  statement  of 
the  theorem,  the  inequality  (6.7)  is  an  equaUty.  □  | 

6.2.2    Implementation  Based  on  Initial  SVD 

We  now  give  an  algorithm  for  evaluating  the  GMD  that  starts  with  the  singular 
value  decomposition  H  =  VSW.  The  algorithm  generates  a  sequence  of  upper  trian- 
gular matrices  R(^),  1  <  L  <  A",  with  Rt^)  =  E.  Each  matrix  R(^)  has  the  following 
properties: 

(a)  rlj^  =  0  when  i  >  j  or  j  >  max  {L,  i}. 

(b)  r'f '  =  a  for  all  i  <  L,  and  the  geometric  mean  of  r,'f  \  L  <  i  <  A',  is  a. 
We  express  R^^+i)  =  Q.R^'^P,  where  Q,  and  P,  are  orthogonal  for  each  k. 

These  orthogonal  matrices  are  constructed  using  a  symmetric  permutation  and  a 
pair  of  Givens  rotations.  Suppose  that  R^-)  satisfies  (a)  and  (b).  If  r[^'  >  a,  then  let 
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n  be  a  permutation  matrix  with  the  property  that  IIR^*')!!  exchanges  the  {k  +  l)-st 
diagonal  element  of  R**'  with  any  element  rpp,  p  >  k,  for  which  rpp  <  a.  If  r[^'  <  a, 
then  let  11  be  chosen  to  exchange  the  {k  +  l)-st  diagonal  element  with  any  element  rpp, 
p  >  k,  for  which  rpp  >  a.  Let  6i  —  r|.^^  and  62  =  r^  denote  the  new  diagonal  elements 
at  locations  k  and  k  +  I  associated  with  the  permuted  matrix  nR^*^'n. 

Next,  we  construct  orthogonal  matrices  Gi  and  G2  by  modifying  the  elements  in 
the  identity  matrix  that  lie  at  the  intersection  of  rows  k  and  A;  +  1  and  columns  k  and 
A;  +  1.  We  multiply  the  permuted  matrix  riR^*"^!!  on  the  left  by  Gj  and  on  the  right 
by  Gi.  These  multiplications  will  change  the  elements  in  the  2  by  2  submatrix  at  the 
intersection  of  rows  k  and  k  +  l  with  columns  k  and  k+  I.  Our  choice  for  the  elements 
of  Gi  and  G2  is  shown  below,  where  we  focus  on  the  relevant  2  by  2  submatrices  of  Gj, 
nRC^'n,  and  Gj: 


(6.8) 


c6i 

SS2 

Si 

0 

c  —s 

a 

X 

-SS2 

c6i 

0 

S2  _ 

s  c 

0 

y 

(GJ) 


(HR^n)  (Gi) 


(R(/=+l)) 


If  61  =  62  =  cr,  we  take  c  =  1  and  s  =  0;  if  (5i  7^  62,  we  take 


c  = 


-TTj  and    s  =  Vl  -  c^. 


In  either  case, 


sc{6l  -  6j)rk  .  6162 
X  =  =   and    y  =  . 


(6.9) 


(6.10) 


a  a 
Since  a  lies  between     and  62,  s  and  c  are  nonnegative  real  scalars. 

Figure  6-1  depicts  the  transformation  from  R^^'  to  GjllR^'^'riGi.  The  dashed 
box  is  the  2  by  2  submatrix  displayed  in  (6.8).  Notice  that  c  and  s,  defined  in  (6.9),  are 
real  scalars  chosen  so  that 


+     =  1    and    {c6if  +  (5^2)^  =  a\ 

With  these  identities,  the  validity  of  (6.8)  follows  by  direct  computation.  Defining 
Qk  =  nG2  and  P*,  =  IIGi,  we  set 


(6.11) 


It  follows  from  Figure  6-1,  (6.8),  and  the  identity  det  (R^^'+i))  =  det  (R^*""*),  that  (a) 
and  (b)  hold  for  L  =  A-  +  1.  Thus  there  exists  a  real  upper  triangular  matrix  R'^',  with 
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Column  k 


X   X   X    0  0 

0 

X  X   X   X  0 

0 

X   X   0  0 

0 

X   X  X  0 

0 

p    ,       0 ;  0 

0 

;X  x;  0 

0 

Row  k 

i    xi  0 

0 

i    x;  0 

0 

X 

0 

X 

0 

X 

X 

^ik) 

 ^  G*2nR**'nG, 

Figure  6-1:  The  operation 

displayed  in  (6.8) 

cr  on  the  diagonal,  and  unitary  matrices  Qi  and  Pi,  i  =  1,2, . . . ,  K  -  I,  such  that 

R'''^  =  (QLi  •  •  •  QlQ[)S(PiP2 . . .  Pfc_i). 

Combining  this  identity  with  the  singular  value  decomposition,  we  obtain  H  —  QRP* 
where 

Q  =  v|^nQ,j.    R  =  R('<\    and    P  =  W         P')  • 

In  summary,  our  algorithm  for  computing  the  GMD,  based  on  an  initial  SVD,  is 
the  following: 

1.  Let  H  =  VEW*  be  the  singular  value  decomposition  of  H,  and  initialize  Q  =  V, 

P  =  W,  R  =  S,  andA:-l. 
2-  If  rkk  >  o.  choose  p  >  A;  such  that  r^p  <  a.  If  r^t  <  a,  choose  p  >  k  such  that 

Tpp  >  a.  In  R,  P,  and  Q,  perform  the  following  exchanges: 

fk+i.k+i  ^ 
P-.Mi    ^  P:,P 

Q,k     ^  Q:,p 

3.  Construct  the  matrices  Gi  and  G2  shown  in  (6.8).  Replace  R  by  GjRGi,  replace 
Q  by  QG2,  and  replace  P  by  PGi. 

4.  lik  =  K  -  1,  then  stop,  QRP*  is  the  GMD  of  H.  Otherwise,  replace  k  by  k  +  I 
and  go  to  step  2. 

Given  the  SVD,  this  algorithm  for  the  GMD  requires  0((?77  +  n)K)  flops.  For 
comparison,  reduction  of  H  to  bidiagonal  form  by  the  Golub-Reinsch  bidiagonalization 
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scheme  [66,  67,  68],  often  the  first  step  in  the  computation  of  the  SVD,  requires  0{mnK) 
flops. 

6.3    Generalized  Triangular  Decomposition 

In  this  section,  we  attempt  to  generalized  decomposition  of  the  form 

H  =  QRP*,  (6.12) 

where  R  is  upper  triangular  and  Q  and  P  have  orthonormal  columns.  We  will  answer 
the  following  two  questions.  First,  what  is  the  necessary  and  sufficient  condition  that  the 
decomposition  of  (6.12)  exists.  Second,  how  to  calculate  such  a  decomposition.  Section 
6.3.1  and  6.3.2  focus  on  answering  the  two  questions. 

6.3.1  Existence  of  GTD 

The  following  result  is  due  to  Weyl  [69]  (also  see  [60,  p.  171]): 

Theorem  6.3.1  If  A  e  C"*^"  with  eigenvalues  A  and  singular  values  cr,  then  A  ^  <r. 

The  following  result  is  due  to  Horn  [70]  (also  see  [60,  p.  220]): 
Theorem  6.3.2  Ifr  G  C"  andcr  G  M"  with  r  ^  cr,  then  there  exists  an  upper  triangular 
matrix  R  6  C"*^"  with  singular  values  ct,,  1  <  i  <  n,  and  with  r  on  the  diagonal  o/R. 

We  now  combine  Theorems  6.3.1  and  6.3.2  to  obtain: 

Theorem  6.3.1  Let  H  €  C"**"  have  rank  K  with  singular  values  (Ti  >  Gi  >  . . .  > 
(Tft-  >  0.  There  exists  an  upper  triangular  matrix  R  G  C'^^^  and  matrices  Q  and  P 
with  orthonormal  columns  such  that  H  =  QRP*  if  and  only  ifr^cr. 

Proof:  If  H  =  QRP*,  then  the  eigenvalues  of  R  are  its  diagonal  elements  and  the 
singular  values  of  R  coincide  with  those  of  H.  By  Theorem  6.3.1,  r  ^  cr.  Conversely, 
suppose  that  r  :<  cr.  Let  H  =  VEW*  be  the  singular  value  decomposition,  where 
E  G  M^^^.  By  Theorem  6.3.2,  there  exists  an  upper  triangular  matrix  R  G  C^""-^  with 
the  r,  on  the  diagonal  and  with  singular  values  cr,,  1  <  i  <  K.  Let  R  =  VqEWq  be 
the  singular  value  decomposition  of  R.  Substituting  E  =  VqRWo  in  the  singular  value 
decomposition  for  H,  we  have 

H  =  (VV*)R(WW*)*. 

In  other  words,  H  =  QRP*  where  Q  =  W*  and  P  =  WWq.  ■ 

6.3.2  The  GTD  Algorithm 

Given  a  matrix  H  G  C"""'  with  rank  K  and  with  singular  values  CTi  >  a2  >  . . .  > 
(Tk  >  Q,  and  given  a  vector  r  G        such  that  r  :^  <r,  we  now  give  an  algorithm  for 
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computing  the  decomposition  H  =  QRP*.  This  algorithm  for  the  GTD  essentially 
yields  a  constructive  proof  of  Theorem  6.3.2. 

Let  VSW*  be  the  singular  value  decomposition  of  H,  where  E  is  a  K  by  /C  diagonal 
matrix  with  the  diagonal  containing  the  positive  singular  values.  We  let  R^^'  6  C^*"^ 
denote  an  upper  triangular  matrix  with  the  following  properties: 

(a)  rlf^  =  0  when  i  >  j  or  j  >  i  >  L.  In  other  words,  the  traihng  principal  submatrix 
of  R(^),  starting  at  row  L  and  column  L,  is  diagonal. 

(b)  If  r(^)  denotes  the  diagonal  of  R(^),  then  the  first  L  -  1  elements  of  r  and  r^^'  are 
equal.  In  other  words,  the  leading  diagonal  elements  of  R(^)  match  the  prescribed 
leading  elements  of  the  vector  r. 

(c)  ri-K  d:  r^^]^,  where  ri-K  denotes  the  subvector  of  r  consisting  of  components  L 
through  K.  In  other  words,  the  trailing  diagonal  elements  of  R(^)  multiplicatively 
majorize  the  trailing  elements  of  the  prescribed  vector  r. 

Initially,  we  set  R^^'  =  S.  Clearly,  (a)-(c)  hold  for  L  =  I.  Proceeding  by  induction, 
suppose  we  have  generated  upper  triangular  matrices  R(^),  L  =  1,2, . . .  satisfying 
(a)-(c),  and  unitary  matrices  Q^,  and  Pl,  such  that  R(^+i)  =  Q^R'^^Pl  for  1  <  L  < 
k.  We  now  show  how  to  construct  unitary  matrices  and  Pfc  such  that  R^'^+i'  = 
Qt.R(*--)Pfc,  where  RC^+i)  satisfies  (a)-(c)  for  L  =    +  1. 

Let  p  and  q  be  defined  as  follows: 

p  =  arg  niin{|rf'|  ■.k<i<K,  |rf '|  >  |rfc|},  (6.13) 
q   =   arg  max{|rf  )|  -.kKiKK,  |rf '|  <  |r,|,  i  ^  p},  (6.14) 

where  rf '  is  the  i-th  element  of  r'*).  Since  rk-.x  ^  ri%,  there  exists  p  and  q  satisfying 
(6.13)  and  (6.14).  Let  II  be  the  matrix  corresponding  to  the  symmetric  permutation 
n*R(*)n  which  moves  the  diagonal  elements  r^^  and  to  the  fc-th  and  {k  +  l)-st 
diagonal  positions  respectively.  Let  6^  =  r^p^  and  ^2  =  r^^'  denote  the  new  diagonal 
elements  at  locations  k  and  k  +  1  associated  with  the  permuted  matrix  n*R('')n. 

Next,  we  construct  unitary  matrices  Gj  and  G2  by  modifying  the  elements  in  the 
identity  matrix  that  lie  at  the  intersection  of  rows  k  and  k  +  1  and  columns  k  and 
A"  +  1.  We  multiply  the  permuted  matrix  n*R(*)n  on  the  left  by  G^  and  on  the  right 
by  Gi.  These  multiplications  will  change  the  elements  in  the  2  by  2  submatrix  at  the 
intersection  of  rows  k  and  k  +  1  with  columns  k  and  A:  +  1.  Our  choice  for  the  elements 
of  Gi  and  G2  is  shown  below,  where  we  focus  on  the  relevant  2  by  2  submatrices  of  G2, 
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Column  k 


X   X   X   X   X  X 
X   X   X   X  X 


Row  k 


X  0 
X 


0  0 
0  0 
X  0 
X 


X  X   X   X   X  X 
X   X   X   X  X 


X  X 
X 


0  0 
0  0 
X  0 
X 


Figure  6-2:  The  operation  displayed  in  (6.15) 


n*RWn,  and  Gi: 


1^ 


c6l 

61  0 

c  —s 

—362  cSi 

0  82 

s  c 

_  0  y 

(n*R('^)n) 

(Gi) 

If      =  \62\  =  \rk\,  we  take  c  =  1  and  s  =  0;  if  \Si  \  ^  \62\,  we  take 


c  = 


and    s  =  Vl  —  c^. 


In  either  case. 


X  —  : —   and    y  = 


2  • 


(6.15) 


(6.16) 


(6.17) 


\rk 

n^RC^'n  to  G^n'R('=)nGi.  The  dashed  box  is  the  2  by  2  submatrix  displayed  i 
(6.15).  Notice  that  c  and  s,  defined  in  (6.16),  are  real  scalars  chosen  so  that 


in 


c'  +  s'  =  l    and    c2|^i|2  +  s>2p  =  |rfc| 


(6.18) 


With  these  identities,  the  vahdity  of  (6.15)  follows  by  direct  computation.  By  the  choice 
of  p  and  q,  we  have 

|^2|  <  kfcl  <  \5x\.  (6.19) 

If  l^^il  7^  it  follows  from  (6.19)  that  c  and  s  are  real  nonnegative  scalars.  It  can  be 
checked  that  the  2  by  2  matrices  in  (6.15)  associated  with  Gj  and  G^  are  both  unitary. 
Consequently,  both  Gj  and  G2  are  unitary.  We  define 


R(fc+i)  =  (nG2)-R<'^)(nGi)  =  q:r(''-)p,., 
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where  Q,  =  UG^  and  P,  =  UG,.  By  (6.15)  and  Figure  6-2,  RC^+D  has  properties  (a) 
and  (b)  for  L  =  A;  +  1.  Now  consider  property  (c). 

We  write  a  ~  b  if  a  and  b  are  equal  after  a  suitable  reordering  of  the  components. 
Let  a,  b,  a+,  and  b+  be  vectors  whose  components  are  ordered  in  decreasing  magnitude, 
and  which  satisfy 


a-r,,/,,    h^ri%,    a+ -  r.+i^;,,    and    b+ ~  r£+|],. 


(6.20) 


Thus  ai  is  the  i-th  largest  (in  magnitude)  component  oftk-.K-  By  the  induction  hypoth- 
esis, we  have  a  :^  b.  To  establish  (c),  we  need  to  show  that  a+  ^  b+.  Let  the  index  s 
be  chosen  so  that     =  rt,  and  let  the  index  t  be  chosen  so  that 

N  >  \rk\  >  \bt+i\.  (6.21) 

By  the  definition  of  p  and  q,  r*^'  =  6,  and  =  6,+,.  As  seen  in  (6.20),  a+  is  obtained 
from  a  by  deleting  a,  =  r,.  The  vector  rf^+D  is  obtained  from  r'*)  by  a  unitary  trans- 
formation that  changes  the  value  of  two  elements.  In  particular,  b+  is  obtained  from  b 
by  replacing  the  adjacent  pair  bt  and  bt+i  by 

^  bA+iTk 

By  (6.21)      >  \y\  >  \bt^i\.  Consequently, 

=  y-  (6.22) 

We  partition  the  proof  of  (c)  into  2  cases. 

Case  1:  s<t.  Since  a+  <  a,  for  all  ^,  a  ^  b,  and  6,  =  6+  for  1  <  i  <  ^,  we  have 

^i:t-i  -<  ^i-.t-i  -<  bi:(_i  =  b+ _i.  (6.23) 

For  J  ><  >  s,  it  follows  from  the  induction  hypothesis  and  the  connection  between  a 
and  3+  that 

i^-^-i  n  \<i = n  \^t\ = n  ^  n  (6.24) 

Since  Gj  and       are  unitary,  the  determinant  of  (6.15) 


gives 


(6.25) 
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where  the  last  equality  in  (6.25)  comes  from  (6.22).  Hence,  for  ;  >  t,  it  follows  that 
} 

i=l 

=  hmi 

Combining  (6.23),  (6.24),  and  (6.26),  we  have  a+  ^  b+. 

Case  2:  s  >  t.  As  before,  (6.23)  holds.  For  t  <  j  <  s,  we  have 

tlK\  =  tl\a^\<fl\bi\  =  \r,\Yl\bt\, 

t=l  i=l  t=l  i=l 

where  the  first  equality  comes  from  the  relation  j  <  s,  the  middle  inequality  is  the 
induction  hypothesis,  and  the  last  equality  is  (6.26).  Rearranging  this  gives 

({^)nwisnivi.  (6.27) 

Since  |aj|/|rfc|  =  |aj|/|a5|  >  1  when  j  <  s,  we  deduce  from  (6.27)  that 

<-i  -<  K.j-i 

when  j  <  s.  This  also  holds  for  j  >  s  due  to  (6.24)  and  (6.26).  This  completes  the 
proof  of  (c). 

Hence,  there  exists  an  upper  triangular  matrix  R'^',  with  ri-^-i  occupying  the  first 
K  -I  diagonal  elements,  and  unitary  matrices  Q;  and  P,,  i  =  1,  2, . . . ,     -  1,  such  that 

R^^''  =  (Q:_i  . . .  Q;QI)E(PiP2  . . .  P,_i).  (6.28) 

Equating  determinants  in  (6.28)  and  utilizing  the  identity  rj*"^  =  for  1  <  z  <  /V"  -  1, 
we  have 

where  the  last  equality  is  due  to  the  assumption  r  :<  (t.  It  follows  that  \r^P\  =  |r/c|. 
Let  C  be  the  diagonal  matrix  obtained  by  replacing  the  (A',  K)  element  of  the  identity 
matrix  by       /tk-  The  matrix  C  is  unitary  since  |rfc|/|rjf  ^|     1.  The  matrix 

R  =  C*R(^)  (6.29) 
has  diagonal  equal  to  r  due  to  the  choice  of  C. 


't-\ 


IlN  Tin 


\i=l 
't-\ 


\i=t+2 

i-1 


j-1 


ni^'i  n  \bt\]=\n\x{\bt\. 


(6.26) 


vi=l 


\i=t+\ 
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Combining  (6.28)  and  (6.29)  with  the  singular  value  decomposition  H  =  VSW* 
gives 

H  =  VQ1Q2    Qfc_iCRP^_i . . .  p;piw*. 

Hence,  we  have  obtained  the  GTD  with 

Finally,  note  that  if  r  is  real,  then  Gi  and  G2  are  real,  which  implies  R  is  real. 

We  summarize  the  steps  of  the  GTD  algorithm  as  follows.  To  make  it  easier  to 
distinguish  between  the  elements  of  the  matrix  R  and  the  elements  of  the  given  diagonal 
vector  r,  we  use  i?^  to  denote  the  element  of  R  and  r;  to  denote  the  i-th  element 
of  r. 

1.  Let  H  =  VEW*  be  the  singular  value  decomposition  of  H,  and  suppose  we  are 
given  r  G  C^'  with  r  :<cr.  Initialize  Q  =  V,  P  =  W,  R  =  S,  and  A;  =3  1. 

2.  Let  p  and  q  be  defined  as  follows: 

p  =  arg  mm{\Ru\  ■.k<i<K,  \Rii\  >  \rk\}, 

q   =   arg  max{\R,^\  :  k  <  i  <  K,  \Rii\  <  \rk\,  i  ^  p}. 

In  R,  P,  and  Q,  perform  the  following  exchanges: 

(Rl:fc-l,fc,  Rl:fc-l,*:+l)  (R-l:fe-l,p,  R-liAi-l.q) 

(P:,fc,P:,,+  l)     ^  (P,p,P:,,) 

(Q:,fc,Q:,fc+l)     ^  (Q:,p,Q:,,) 

3.  Construct  the  matrices  Gi  and  G2  shown  in  (6.15).  Replace  R  by  G2RG1,  replace 
Q  by  QG2,  and  replace  P  by  PGj. 

4.  If  k  =  K  -  1,  then  go  to  step  5.  Otherwise,  replace  k  hy  k  +  1  and  go  to  step  2. 

5.  Multiply  column  /i"  of  Q  by  Rkk/tk,  replace  Rkk  by  rn-  The  product  QRP*  is 
the  GTD  of  H  based  on  r. 

A  Matlab  implementation  of  our  GTD  algorithm  appears  in  the  Appendix.  Given  the 
SVD,  this  algorithm  for  the  GTD  requires  0{{m+n)K)  flops.  For  comparison,  reduction 
of  H  to  bidiagonal  form  by  the  Golub-Reinsch  bidiagonalization  scheme  [66,  67,  68],  often 
the  first  step  in  the  computation  of  the  SVD,  requires  0{mnK)  flops. 
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Time 

a  error 

A  error 

Dimension 

SVD_EIG 

GTD 

SVD_EIG 

GTD 

SVD_EIG  GTD 

100 

0.61 

0.20 

9.8e-15 

l.Oe-14 

3.3e-14  0 

200 

2.24 

0.38 

2.0e-14 

1.7e-14 

5.9e-13  0 

400 

13.84 

0.86 

6.8e-14 

3.7e-14 

3.3e-13  0 

800 

97.50 

2.30 

9.8e-14 

7.0e-14 

1.5e-10  0 

1200 

317.83 

5.67 

l.le-13 

1.3e-13 

1.5e-9  0 

1600 

746.77 

10.77 

3.2e-13 

1.5e-13 

7.7e-4  0 

in  seconds,  singular  value  and  eigenvalue  errors  in  sup-norm) 


6.3.3    Inverse  Eigenvalue  Problem 

In  [71]  Chu  presents  a  recursive  procedure  for  constructing  matrices  with  prescribed 
eigenvalues  and  singular  values.  His  algorithm,  which  he  calls  SVD_EIG,  is  based  on 
Horn's  divide  and  conquer  proof  of  the  sufficiency  of  Weyl's  product  inequahties.  In 
general,  the  output  of  SVD_EIG  is  not  upper  triangular.  Consequently,  this  routine  could 
not  be  used  to  generate  the  GTD.  Chu  notes  that  to  achieve  an  upper  triangular  matrix 
would  require  an  algorithm  "one  order  more  expensive  than  the  divide-and-conquer 
algorithm" . 

Given  a  vector  of  singular  values  (t  €  R"  and  a  vector  of  eigenvalues  A  e  C",  with 
A  ^  cr,  we  can  use  the  GTD  to  generate  a  matrix  R  with  A  on  the  diagonal  and  with 
singular  values  cr.  In  this  section,  we  compare  the  solution  to  the  inverse  eigenvalue 
problem  provided  by  the  GTD  to  Chu's  algorithm.  In  our  initial  experimentation,  we 
discovered  that  the  algorithm  of  Chu,  as  presented  in  [71],  did  not  work.  When  this  was 
pointed  out,  Chu  provided  an  adjustment  in  which  the  parameter  ijl  in  [71,  (2.2)]  was 
replaced  by  /xAi/|Ai|.  With  this  adjustment,  it  was  possible  to  solve  4  by  4  and  5  by 
5  test  cases  that  previously  caused  failure.  The  results  reported  in  this  section  use  the 
adjusted  algorithm. 

Both  Matlab  routines  GTD  (see  Appendix)  and  SVD_EIG  [71]  require  0{n'^)  flops, 
so  in  an  asymptotic  sense,  the  approaches  are  equivalent.  In  Table  6-1  we  compare  the 
actual  running  times  of  GTD  and  SVD.EIG  for  matrices  of  various  dimensions.  These 
computer  runs  were  performed  on  a  Sun  Workstation  with  1  GB  memory.  In  making 
these  runs,  the  portion  of  the  GTD  code  connected  with  the  updating  of  the  matrices 
P  and  Q  was  deleted  since  SVD_EIG  does  not  accumulate  the  unitary  matrices.  The 
input  arrays  a  and  A  were  generated  in  the  following  way:  Using  the  Matlab  routine 
RAND,  we  randomly  generated  a  square  matrix  whose  element  lie  between  0  and  1.  The 
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singular  values  cr  were  computed  using  the  Matlab  routine  SVD  and  the  eigenvalues  A 
were  computed  using  Matlab's  EIG.  By  the  theorem  of  Weyl  [69],  \  <  a.  We  then 
used  both  SVD_EIG  and  GTD  to  generate  matrices  with  the  specified  singular  values  and 
eigenvalues.  Five  different  matrices  of  each  dimension  were  generated  and  the  average 
running  time  is  reported  in  Table  6-1. 

The  times  shown  in  Table  6-1  indicate  that  GTD  becomes  increasingly  more  efficient 
than  SVD_EIG  as  the  matrix  dimension  increases.  For  a  dimension  of  100,  GTD  is  about 
three  times  faster  than  SVD_EIG.  For  a  dimension  of  1600,  GTD  is  about  70  times  faster 
than  SVD_EIG.  In  an  efficiently  designed  compiled  code,  the  difference  in  speed  between 
these  two  approaches  to  inverse  eigenvalue  problems  could  be  more  substantial:  the 
permutations  appearing  in  GTD  could  be  replaced  by  the  updating  of  a  pointer  array; 
also,  the  columns  of  R  that  are  zero  except  for  the  diagonal  entry  could  be  flagged, 
and  when  multiplying  R  by  Gi,  we  can  skip  the  multiplication  of  these  essentially  zero 
columns. 

In  Table  6-1  we  also  compare  the  specified  singular  values  and  eigenvalues  to  those 
obtained  by  applying  Matlab's  svD  and  EIG  routines  to  the  generated  matrices.  That  is, 
for  each  matrix  output  by  either  SVD_EIG  or  GTD,  we  use  Matlab's  routines  to  compute 
the  singular  values  and  eigenvalues,  and  we  compare  to  the  specified  singular  values  and 
eigenvalues  using  the  sup-norm.  The  errors  reported  in  Table  6—1  are  the  average  errors 
for  the  5  random  matrices  of  each  dimension.  Both  routines  generate  matrices  with 
singular  values  that  match  those  computed  by  Matlab's  svD  routine  to  within  13  or  14 
digits.  Observe  that  GTD  always  matches  exactly  the  prescribed  eigenvalues  since  the 
generated  matrix  is  triangular,  with  the  specified  eigenvalues  on  the  diagonal.  The  error 
in  the  eigenvalues  of  the  matrix  generated  by  SVD_EIG  was  comparable  to  the  singular 
value  error  for  matrices  of  dimension  up  to  400.  Thereafter,  the  error  in  the  eigenvalues 
grew  quickly.  When  the  matrix  dimension  doubled  from  400  to  800,  the  error  increased 
roughly  by  the  factor  10^.  And  when  the  matrix  dimension  doubled  again  from  800  to 
1600,  the  error  increased  roughly  by  the  factor  10^. 

A  recursive  algorithm  can  require  a  significant  amount  of  memory.  While  SVD_EIG 
executed,  we  monitored  the  memory  usage  using  the  Unix  "top"  command.  We  observed 
that  for  a  matrix  of  dimension  1600,  the  memory  consumption  grew  to  319  MB.  Since  a 
complex  double  precision  matrix  of  dimension  1600  occupies  about  41  MB  memory,  the 
recursion  required  more  than  7  times  as  much  space  as  the  matrix  itself. 
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6.4  Conclusions 

In  this  chapter,  we  introduce  two  novel  matrix  decomposition  algorithms,  including 
the  geometric  mean  decomposition  and  the  generalized  triangular  decomposition.  The 
GMD  H  =  QRP*  is  a  solution  of  the  maximin  problem  (6.3);  the  smallest  diagonal 
element  of  R  is  as  large  as  possible.  Starting  with  the  SVD,  we  show  that  the  GMD 
can  be  computed  using  a  series  of  Givens  rotations,  and  row  and  column  exchanges. 
Alternatively,  the  GMD  could  be  computed  directly,  without  performing  an  initial  SVD, 
if  H  is  first  reduced  by  unitary  transformations  to  a  real  matrix.  In  a  further  extension  of 
our  algorithm  for  the  GMD,  we  show  in  [51]  how  to  compute  a  factorization  H  =  QRP* 
where  the  diagonal  of  R  is  any  vector  satisfying  the  Weyl  multiplicative  majorization 
conditions  [69].  The  GTD  represents  the  most  general  unitary  decomposition  H  = 
QRP*.  That  is,  the  diagonal  r  of  R  must  satisfy  r  :<  cr,  where  cr  is  the  vector  of  singular 
values  for  H,  while  for  any  diagonal  r  with  r  :<  tr,  we  can  write  H  =  QRP*.  The  GTD 
includes,  as  special  cases,  the  singular  value  decomposition,  the  Schur  decomposition, 
the  QR  decomposition,  and  the  GMD.  Similar  to  GMD,  given  the  SVD,  the  GTD 
based  on  r  can  also  be  evaluated  using  a  series  of  Givens  rotations  and  permutations. 
The  GTD  algorithm  provides  a  new  proof  of  Horn's  theorem  [70].  Apphcations  of  the 
GTD  include  transceiver  design  for  MIMO  communications  [72]  and  inverse  eigenvalue 
problems  surveyed  extensively  in  [73]. 

Appendix 
Matlab  Implementation  of  GTD 

*/.  Input: 
•/. 

7,  H  =  U*S*V'  (singular  value  decomposition  of  H) 


'/,  U  and  V  orthonormal  columns 

'/»  S  diagonal  matrix  with  nonnegative  diagonal  entries 

y,  r  desired  diagonal  of  R 
7,  — nnz  (r)  =  nnz  (S) 

V,  — r  multiplicatively  majorized  by  diag  (S) 

%  — product  nonzero  r  =  product  nonzero  diag  (S) 

•/. 


'/,  Output: 

y. 
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•/.  H  =  q*R*P'  (GTD  based  on  r) 

7,  P  and  Q  orthonormal  columns 

*/,  R  upper  triangular,  R  (i,  i)  =  r  (i 

function  [Q,  R,  P]  =  gtd  (U,  S,  V,  r) 

d  =  diag  (S)  ; 
K  =  min  (size  (S))  ; 
P=V;Q=U;R=  zeros  (K)  ; 
for  k  =  1  :  K-1 

rk  =  r  (k)  ; 

abs_rk  =  abs  (rk)  ; 

kpl  =  k  +  1  ;  kml  =  k  -  1  ; 

I  =  find  (abs  (d  (k  :    K))  >  absjrk)  ; 

if  (  isempty  (I)  ) 

[x,  p]  =  max  (abs  (d  (k  :    K)))  ; 
p  =  p  +  kml  ; 

else 

I  =  I  +  kml  ; 

[x,  p]  =  min  (abs  (d  (I)))  ; 
P  =  I  (p)  ; 

end 

deltal  =  d  (p)  ; 

d  (  [k  p] )  =  d  (  [p  k] )  ; 

I  =  find  (abs  (d  (kpl  :    K))  <=  abs_rk) 

if  (  isempty  (I)  ) 

[x,  q]  =  min  (abs  (d  (kpl  :    K)))  ; 

q  =  q  +  k  ; 

else 

I  =  I  +  k  ; 

[x,  q]  =  max  (abs  (d  (I)))  ; 
q  =  I  (q)  ; 

end 
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delta2  =  d  (q)  ; 

d  ( [kpl  q] )  =  d  ( [q  kpl] )  ; 

sq_deltal  =  abs  (deltal)"2  ; 

sq-delta2  =  abs  (delta2)"2  ; 

sq_rk  =  abs_rk"2  ; 

denom  =  sq.deltal  -  sq_delta2  ; 

if  (  (denom  <=  0)   I   (sq_rk  >  sq_deltal)  ) 

c  =  1  ;  s  =  0  ; 
elseif  (  sq_rk  <  sq_delta2  ) 

c  =  0  ;  s  =  1  ; 

else 

c  =  sqrt  ((sq_rk  -  sq_de It a2) /denom)  ; 
s  =  sqrt  (l-c*c)  ; 

end 

if  (  sqjrk  >  0  ) 

X  =  -s*c*rk*denom/sq_rk  ; 
y  =  deltal*delta2*rk/sq_rk  ; 
Gl  =  [  c  -s 

s    c  ]  ; 
G2  =  [  c*deltal  -s*(delta2') 

s*delta2    c*(deltal')  ]  ; 
G2  =  ((rk')/sq_rk)  ♦  G2  ; 

else 

X  =  0.  ; 
y  =  deltal  ; 
Gl  =  [  0  -1 

10]; 
G2  =  Gl  ; 

end 

if  (  k  >  1  ) 

7.  permute  the  columns 

R  (l:kml,   [k  p] )  =  R  (l:kml,   [p  k] )  ; 

R  (l:kml,   [kpl  q])  =  R  (l:kml.   [q  kpl]) 
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•/.  apply  Gl  to  R 

R  (l:kml,  [k  kpl])  =  R  (l:kml,  [k  kpl])*Gl  ; 

end 

R  (k,  k)  =  rk  ; 
R  (k,  kpl)  =  X  ; 
d  (kpl)  =  y  ; 

'/,  permute  the  columns 

P  (:,   [k  p])  =  P  (:,   [p  k])  ; 

P  (:,   [kpl  q])  =  P  (:,   [q  kpl])  ; 

Q  (:,   [k  p])  =  q  (:,   [p  k])  ; 

Q  (:,   [kpl  q])  =  q  (:,   [q  kpl])  ; 

7.  apply  Gl  to  P 

P  (:,   [k  kpl])  =  P  (:,   [k  kpl])*Gl  ; 
•/.  apply  G2  to  q 

q  (:,   [k  kpl])  =  Q  (:,   [k  kpl])*G2  ; 

end 

R  (K,  K)  =  r  (K)  ; 
if  (  r  (K)  ~=  0  ) 

q  (:,  K)  =  q  (:.  K)*d  (K)/  r  (K)  ; 

end 

P  =  P  (: ,  1:K)  ; 
q  =  q  (:,  1:K)  ; 


CHAPTER  7 
CONCLUSIONS 

This  dissertation  studies  the  signal  processing  aspect  of  MIMO  communications. 
We  present  a  new  perspective  to  the  MIMO  communications:  any  MIMO  scheme  can 
be  regarded  as  a  MIMO  channel  decomposer,  which  decomposes  a  MIMO  channel  into 
multiple  scalar  subchannels.  Based  on  this  perspective,  this  dissertation  presents  three 
novel  MIMO  transceiver  designs;  the  geometric  mean  decomposition  (GMD)  scheme, 
the  uniform  channel  decomposition  (UCD)  scheme,  and  the  tunable  channel  decompo- 
sition (TCD)  scheme.  All  these  schemes  deploying  either  a  VBLAST  detector  at  the 
receiver  or  a  dirty  paper  precoder  at  the  transmitter.  These  transceiver  designs  rep- 
resent a  paradigm  shift  from  the  conventional  linear  MIMO  transceiver  design  to  the 
nonlinear  ones.  The  superior  performance  of  the  GMD  and  UCD  scheme  unveils  the 
practical  significance  of  the  collaborations  between  the  transmitter  and  receiver.  That 
is,  such  collaborations  facilitate  achieving  the  optimal  tradeoff  between  the  diversity  and 
multiplexing  gains  promised  by  the  MIMO  communication  theory.  The  TCD  scheme 
represents  a  unifying  solution  to  a  considerably  wide  range  of  problems,  including  de- 
signing the  precoder  for  OFDM  communications  and  the  optimal  CDMA  sequences. 

Motivated  by  the  application  of  transceiver  designs,  this  dissertation  also  intro- 
duces two  novel  matrix  decomposition  algorithms,  i.e.,  the  geometric  mean  decompo- 
sition (GMD)  and  the  generahzed  triangular  decomposition  (GTD).  The  two  matrix 
decompositions  are  the  cornerstones  of  the  three  transceiver  designs  proposed  in  this 
dissertation.  Moreover,  the  two  matrix  decomposition  algorithms  have  significant  im- 
plications in  the  matrix  analysis  community.  For  instance,  the  GTD  is  a  new  and  more 
efficient  solution  to  the  inverse  eigenvalue  problem. 
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It  is  recently  discovered  that  deploying  multiple  transmitting  and  multiple  receiv- 
ing antennas  in  a  wireless  communication  system  can  drastically  improve  the  data  rate 
and  reliability  of  wireless  communications,  even  without  consuming  additional  band- 
width and  input  power.  This  so-called  multi-input  multi-output  (MIMO)  technology 
has  been  under  intense  research  and  will  be  applied  to  the  next  generation  of  wireless 
communication  networks. 

This  dissertation  focuses  on  designing  practical  transceiver  designs  for  MIMO  sys- 
tems with  sound  theoretical  foundations.  Three  designs,  i.e.,  the  GMD,  UCD,  and  TCD 
schemes,  are  proposed.  These  designs  represent  a  paradigm  shift  from  the  conventional 
linear  designs  to  the  nonlinear  designs  while  keeping  the  implementation  complexity 
low.  It  is  proven,  both  through  theoretical  analyses  and  numerical  simulations,  that  the 
three  designs  are  much  better  than  their  linear  counterparts  in  that  they  can  achieve 
faster  and  more  reliable  communications.  The  schemes  proposed  in  this  dissertation 
will  probably  play  an  important  role  in  the  next  generation  wireless  fidelity  (Wi-Fi) 
and  digital  subscribe  line  (DSL)  technologies.  Besides  its  engineering  significance,  this 
dissertation  also  invents  two  matrix  decompositions,  which  are  significant  contributions 
to  the  numerical  analysis  community. 
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